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POSITIONAL SEQUENCING BY HYBRIDIZATION 

Reference to Related Applications 

This is a continuation-in-part of United States patent application serial 
number 07/972,012 filed November 6, 1992. 

Background of the Invention 
1. Field of the Invention 

This invention relates to methods for sequencing nucleic acids by 
positional hybridization and to procedures combining these methods with more 
conventional sequencing techniques and with other molecular biology techniques 
including techniques utilized in PGR (polymerase chain reaction) technology. Useful 
apphcations include the creation of probes and arrays of probes for detecting, 
identifying, purifying and sequencing target nucleic acids in biological samples. The 
invention is also directed to novel methods for the replication of probe arrays, to the 
rephcated arrays, to diagnostic aids comprising nucleic acid probes and arrays useful for 
screening biological samples for target nucleic adds and nucleic acid variations. 

2. Description of the Background 

Since the recognition of nucleic acid as the carrier of the genetic code, a 
great deal of interest has centered around determining the sequence of that code in the 
many forms which it is found. Two landmark studies made the process of nucleic acid 
sequencing, at least with DNA, a common and relatively rapid procedure praaiced in 
most laboratories. The first describes a process whereby terminally labeled DNA 
molecules are chemically cleaved at single base repetitions (A.M. Maxam and W. 
Gilbert, Proc. Natl. Acad. Sci. USA 74:560-564, 1977). Each base position in the nucleic 
acid sequence is then determined from the molecular weights of fragments produced by 
panial cleavages. Individual reactions were devised to cleave preferentially at guanine, 
at adenine, at cytosine and thymine, and at cytosine alone. When the products of these 
four reactions are resolved by molecular weight, using, for example, polyacrylamide gel 
electrophoresis, DNA sequences can be read from the pattern of fragments on the 
resolved gel. 



The second study describes a procedure whereby DNA is sequenced using 
a variation of the plus-minus method (F. Sanger et al., Proc. Natl. Acad. Sci. USA 
74:5463-67. 1977). TTiis procedure takes advantage of the chain terminating ability of 
dideoxynucleoside triphosphates (ddNTPs) and the ability of DNA polymerase to 
incorporate ddNTP with nearly equal fidelity as the natural substrate of DNA 
polymerase, deoxynucleosides triphosphates (dNTPs). A primer, usually an 
oligonucleotide, and a template DNA are incubated together in the presence of a useful 
concentration of all four dNTPs plus a limited amoum of a single ddNTP. Hie DNA 
polymerase occasionally incorporates a dideoxynucleotide which terminates chain 
extension. Because the dideoxynucleotide has no 3'-hydroxyl, the initiation poim for the 
polymerase enzyme is lost. Polymerization produces a mixture of fragments of varied 
sizes, all having idemical 3' termini. Fractionation of the mixture by, for example 
polyacrylamide gel electrophoresis, produces a pattern which indicates the presence and 
position of each base in the nucleic acid. Reactions with each of the four ddNTPs 
allows one of ordinary skill to read an entire nucleic acid sequence from a resolved gel. 

Despite their advantages, these procedures are cumbersome and 
impractical when one wishes to obtain megabases of sequence information. Further 
these procedures are, for all practical purposes, limited to sequencing DNA. Although 
variations have developed, it is still not possible using either process to obtain sequence 
information directly from any other form of nucleic acid. 

A new method of sequencing has been developed which overcomes some 
of the problems associated with currem methodologies wherein sequence information 
IS obtained in multiple discrete packages. Instead of having a panicular nucleic acid 
sequenced one base at a time, groups of contiguous bases are determined simultaneously 
by hybridization. There are many advantages including increased speed, reduced 
expense and greater accuracy. 

Two general approaches of sequencing by hybridization have been 
suggested. Their practicality has been demonstrated in pilot smdies. In one format, a 
complete set of 4- nucleotides of length n is immobilized as an ordered array on a solid 



wo 94/11530 



PCr/US93/10616 



suppon and an unknown DNA sequence is hybridized to tliis array (K.R. Klirapico et 
al., J. DNA Sequencing and IVlapping 1:375.8S, 1991). Tht resulting hybridization 
pattern provides all n-tuple words in the sequence, Tlis is sufficient to detemnne shon 
sequences except for simple tandem repeats. 

In the second format, an array of immobilized samples is hybridized with 
one short oligonucleotide at a time (Z. Strezoska et al., Proc Natl. Acad Sci USA 
88:10,089-93, 1991). When repeated 4- times for each oligonucleotide of length n, much 
of the sequence of aU the immobilized samples would be determined. In both 
approaches the intrinsic power of the method is that many sequenced regions are 
determmed in paraUel. In acnial practice the array size is about 10* to lO". 

Another powerful aspect of the method is that information obtained is 
qmte redundant, especiaUy as the size of the nucleic acid probe grows. Mathematical 
smmlations have shown that the method is quite resistant to experimental errors and 
that far fewer than all probes are necessary to determine reliable sequence data (P A. 
Pevzner et aL, J. Biomol. Struc. & Dyn. 9:399-110, 1991; W. Bains, Genomics 11-295. 
301, 1991). 

In spite of an overaU optimistic outlook, there are still a number of 
potentially severe drawbacks to acmal implementation of sequencing by hybridization 
First and foremost among these is that 4- rapidly becomes quite a large number if 
chemical synthesis of aU of the oUgonudeotide probes is acmaUy contemplated. Various 
schemes of automating this synthesis and compressing the products into a small scale 
array, a sequencing chip, have been proposed. 

A second drawback is the poor level of discrimination between a correcUy 
hybndtzed, perfectly matched duplexes, and an end mismatch. In part, these drawbacks 
have been addressed at least to a smaU degree by the method of continuous stacking 
hybndtzation ai reported by a Khrapko et al. (FEBS Lett 256:118-22 1989) 
Continuous stacking hybridization is based upon the observation that when a singie- 
stranded oUgonucleotide is hybridized adjacem to a double-stranded oligonucleotide the 
two duplexes are mumaliy stabilized as if they are positioned side-to-side due \o a 
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of Chosen length on a solid support, and the nucleotide sequence of the target 
determined, at least panially, from icnowledge of the sequence of the bomid fragments 
and the pattern of hybridization observed. Although promising, as a practical matter, 
this method has nmnerous drawbacks. Probes are entirely single-stranded and binding 
stability is dependant upon the size of the duplex. However, eveiy additional nucleotide 
of the probe necessarily increases the size of the array by four fold creating a dichotomy 
which severly restricts its plausible use. Further, there is an inability to deal with branch 
point ambiguities or secondaiy strurture of the target, and hybridization conditions will 
have to be taylored or in some way accounted for for each binding event. 

R. Drmanac et al. (U.S. Patem No. 5,202,231; which is specificaUy 
incorporated by reference) is directed to methods for sequencing by hybridization using 
sets of oligonucleotide probes with randon sequences. TTiese probes, although useful, 
suffer from some of the same drawbacks as the methodology of Southern (1989). and 
like Southern, fail to recognize the advantages of stacking interactions. 

K.R. Khrapko et al. (FEES Lett. 256:118-22, 1989; and J. DNA 
Sequencing and Mapping 1:357-88. 1991) attempt to address some of these problems 
using a technique referred to as continuous stacking hybridization. With continuous 
stacking, conceptuaUy. the entire sequence of a target nucleic add can be determined 
BasicaUy. the target is hybridized to an array of probes, again single-stranded, denatured 
from the array, and the dissociation kinetics of denaturation analyzed to determine the 
target sequence. Although also promising, discrimination between matches and mis- 
matches (and simple backgromid) is low, and further, as hybridization conditions are 
inconstant for each duplex, discrimination becomes increasingly reduced with increasing 
target complexity. 

Summary of the Inventinn 

The present invention overcomes the problems and disadvantages 
associated with currem strategies and designs and provides new methods for rapidly and 
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accurately detennining the n-cleotide sequence of a nucleic acid by the herein described 

methods of positional sequencing by hybridization. 

One embodiment of the invention is directed to arrays of R' different 

nudetc add probes wherein ead, probe comprises a double-stranded portion of length 
D. a termmal single-stranded portion of length S, and a random nucleotide sequence 
wthm the single-stranded portion of length R. Tlese arrays may be bound to soUd 
supports and are useful for determining the nudeotide sequence of unknown nucleic 
actds and for the detection, identification and purification of target mrdeic adds in 
biological samples. 

Anotiier embodiment of ti,e invention is directed to methods for creating 
arrays of probes comprising d,e steps of synthesizing a fir« set of nudeic adds eadi 
comprtstng a constant sequence of length C at the y-terminus, and a random sequence 
of length R at the S'-terminus, synthesizing a second set of nudeic acids each comprising 
a sequence complimentary to ti,e constant sequence of fl,e fint nudeic add and 
hybndmng tile first set witii tiie second set to form tile array. 

Another embodiment of Uie invention is directed to meUiods for creating 
arrays of probes comprising tile steps of syntiiesizing a set of nucleic adds each 
contammg a random internal sequence of lengtii R flanked by tile cleavage sites of a 
restiiction enzyme, synthesizing a se, of primers each comphementiuy to a non-random 
sequence of Uie nudeic add, hybridizing tiie two sets togetiier to form hybrids, extending 
the sequence of tiie primer by polymerization using tiie mrdeic add as a template and 
cleaving the hybrids witii Uie restriction enzyme to fonn an array of probes witii a 
double-stranded portion and a single-stranded ponion and with the random sequence 
witiiin the smgle sn-anded portion. 

Another embodiment of the invemion is directed to replicated arravs and 
methods for replicating arrays of probes, preferably on a solid support, comprising the 
steps of syntiiesizing an array of nudeic adds eadi comprising a constant sequence of 
length C at a 3'-tenninus and a random'sequence of lengtii R at a 5'-,enninus, fixing tiie 
array to a first solid support, synthesizing a set of nucleic adds each comprising a 
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sequence complimentary to the constant region of the array, hybridizing the nucleic 
acids of the set with the array, enzymatically extending the nucleic acids of the set using 
the random sequences of the array as templates, denamring the set of extended nucleic 
acids, and fixing the denatured nucleic adds of the set to a second solid support to 
create the repUcated array of probes. The replicated array may be single-stranded or 
double-stranded, it may be fixed to a solid support or free in solution, and it is useful 
for sequencing, detecting or simply identifying target nucleic acids. 

The array is also useftil for the purification of nucleic acid from a complex 
mixture for later identification and/or sequencing. A purification array comprises 
sufficient numbers of probes to hybridize and thereby effectively capture the target 
sequences from a complex sample. The hybridized array is washed to remove non-target 
nucleic acids and any other materials which may be present and the target sequences 
eluted by denaturing. From the elution, purified or semi-purified target sequences are 
obtained and coUected. This collection of target sequences can then be subjeaed to 
nonnal sequencing methods or sequenced by the methods described herein. 

Another embodiment of the invention is directed to nucleic acid probes 
and methods for creating nucleic add probes comprising the steps of synthesizing a 
plurality of single-stranded first nucleic adds and a pluraUty of longer single-stranded 
second nucleic adds wherein each each second nudeic add comprises a random 
tenninal sequence and a sequence complimentary to a sequence of the first nudeic 
adds, hybridizing the first nudeic adds to the second to form partial duplexes having 
a double-stranded portion and a single-stranded portion with the random sequence 
within the single-stranded portion, hybridizing a target nudeic add to the partial 
duplexes, optionally ligating the hybridized target to the first nudeic add of the partial 
duplexes, isolating the second nudeic add from the ligated duplexes, synthesizing a 
plurality of third nudeic adds each complimentary to the constant sequence of the 
second nudeic acid, and hybridizing the third nucleic adds with the isolated second 
nudeic adds to create the nucleic add probe. Alternatively, after formation of the 
partial duplexes, the target is Ugated as before and hybridized with a set of 



oligonucleotides comprising random sequences. These oligonucleotides are ligated to 
the second nucleic acid, the second nucleic acid is isolated, another plurality of first 
nucleic acids are synthesized, and the first nucleic acids are hybridized to the 
oligonucleotide ligated second nucleic acids to form the probe. Ligation allows for 
hybridization to be performed under a single set of hybridization conditions. Probes 
may be fixed to a soUd support and may also contain enzyme recognition sites within 
their sequences. 

Another embodiment of the invention is directed to diagnostic aids and 
methods utilizing probe arrays for the detection and identification of target nucleic adds 
in biological samples and to methods for using the diagnostic aids to screen biological 
samples. Diagnostic aids as described are also useful for the purification of identified 
targets and. if desired, for their sequencing. Hiese aids comprise probes, solid supports, 
labels, necessary reagents and the biological samples. 

Other advantages of the invention are set forth in part in the description 
which follows, and in part, will be obvious from this description, or may be learned from 
the practice of this invention. The accompanying drawings which are incorporated in 
and constitute a part of this specification, illustrate and. together with this description, 
serve to explain the principle of the invention. 

Brief Desc ription of the Drawings 

Figure 1 Energetics of stacking hybridization. Structures consist of a long target 

and a probe of length n. The top three sample are ordinary hybridization 

and the bottom three are stacking hybridization. 
Figure 2 (A) The first step of the basic scheme for positional sequencing by 

hybridization depicting the hybridization of target nucleic acid with probe 

forming a 5' overhang of the target. 

(B) The first step of the alternate scheme for positional sequencing by 
hybridization depicting the hybridization of target nucleic acid with probe 
forming a 3' overhang of the probe. 
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Figure 4 
Figure 5 

Figure 6 

Figure 7 

Figure 8 

Figure 9 

Figure 10 
Figure 11 
Figure 12 
Figure 13 

Figure 14 
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Graphic representation of the ligation step of positional sequencing by 
hybridization wherein hybridization of the target nucleic acid produces 
(A) a 5' overhang or (B) a 3' overhang. 
Preparation of a random probe array. 

Single nucleotide extension of a probe hybridized with a target nucleic 
acid using DNA polymerase and a single dideoxynudeotide. 
Preparation of a nested set of targets using labeled target nucleic acids 
partially digested with exonuclease HI. 

Determination of positional information using the ratio of internal label 
to terminal label. 

(A) Extension of one strand of the probe using a hybridized target as 
template with a single deo;tynucleotide. 

(B) Hybridization of target with a fixed probe followed by Ugation of 
probe to target. 

Four color analysis of sequence extensions of the 3' end of a probe using 
three labeled nucleoside triphosphates and one unlabeled chain 
terminator. 

Extension of a nucleic add probe by hgation of a pentanucleotide 3' 
blocked to prevent polymerization. 

Preparation of a customized probe containing a 10 base pair sequence 
that was present in the original target nucleic add. 
Graphic representation of the general procedure of positional sequencing 
by hybridization. 

Graphical representation of the ligation effidency of positional 
sequencing. Depicted is the relationship between the amount of label 
remaining over the total amounts of label in the reaction, verses NaCl 
concentration. 

A diagrammatic representation of the construction of a complimentary 
array of master beads. 
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Description nf rh g Inventinn 

The present invention overcomes the problems and disadvantages 
assoaated with current strategies and designs and provides new methods and probes 
new diagnostic aids and methods for using the diagnostic aids, and new arrays and 
methods for creating arrays of probes to detect, identify, purify and sequence target 
nucleic acids. Nucleic acids of the invention include sequences of deoxyribonucleic acid 
(DNA) or ribonucleic add (RNA) which may be isolated from natural sources 
recombmantfy produced, or artificially synthesized. Preferred embodiments of the 
present mvention is probe synthesized using traditional chemical synthesis, using the 
more rapid polymerase chain reaction (PCR) technology, or using a combination of 
these two methods. 

'•'""^'^ P^'y^d^ °"*ic acid 
(PNA) or any sequence of what are commonly referred to as bases joined by a chemical 
backbone «,at have ,he ability ,o ba« pair or hybridize with a complimentary chemical 
~e. -ne bases of DNA, RNA, and PNA are purines and pyrimidines linearly 
Imked ,0 a chemical backbone. Common chemical backbone strucn^es are deoxyribose 
phosphate and ribose phosphate. Recent smdies demonstrated that a ntmiber of 
addttronal structures may also be effective, such as the polyamide backbone of PNA 
(P.E Nielsen et al., Sd. 254:1497-1500, 1991). 

The purines found in both DNA and RNA are adenine and guanine but 
others known to exis. are xanthine, hypoxanthine, 2- and Iniiaminopurine, and o'ther 
more modified bases. n,e pyrimidines are cytosine, which is common to both DNA and 
RNA, uracU found predominanUy in RNA and thymidine which occurs exclusively in 
DNA Some of the more atypical pyrimidines mdude metiiylcytosine, hvdroxymethyl- 
<^osme, methyluracil, hydroxymethyluracil, dihydroxypentyluracil, and other base 
modtfications. These bases interact in a complimentaty fehion to form base-pair^ such 
as, for example, guanine with cytosine and adenine with thymidine. However this 
mvenuon also encompasses siotations in which there is nontraditional base pairing'such 



10 



15 



20 



I 



wo 94/11530 



PCr/US93/10616 



-li- 



as Hoogsteen base pairing which has been identified in cenain tRNA molecules and 
postulated to exist in a triple helix. 

One embodiment of the invention is directed to a method for determining 
a nucleotide sequence by positional hybridization comprising the steps of (a) creating 
a set of nucleic acid probes wherein each probe has a double-stranded portion, a single- 
stranded portion, and a random sequence within the single-stranded ponion which is 
determinable, (b) hybridizing a nucleic acid target which is at least partly single-stranded 
to the set of nucleic acid probes, and (c) determining the nucleotide sequence of the 
target which hybridized to the single-stranded portion of any probe. The set of nucleic 
acid probes and the target nucleic acid may comprise DNA, RNA, PNA, or any 
combination thereof, and may be derived from natural sources, recombinant sources, or 
be synthetically produced. Each probe of the set of nucleic acid probes has a double- 
stranded portion which is preferably about 10 to 30 nucleotides in length, a single- 
stranded portion which is preferably about 4 to 20 nucleotides in length, and a random 
sequence within the single-stranded portion which is preferably about 4 to 20 nucleotides 
in length and more preferably about 5 nucleotides in length. A principle advantage of 
this probe is in its structure. Hybridization of the target nucleic acid is encouraged due 
to the favorable thermodynamic conditions established by the presence of the adjacent 
double-strandedness of the probe. An entire set of probes contains at least one example 
of every possible random nucleotide sequence. 

By way of example only, if the random portion consisted of a four 
nucleotide sequence (R=4) of adenine, guanine, thymine, and cystosine, the total 
number of possible combinations (4") would be 4* or 256 different nucleic acid probes. 
If the number of nucleotides in the random sequence was five, the number of different 
probes within the set would be 4* or 1,024. This becomes a veiy large number indeed 
when considering sequences of 20 nucleotides or more. 

However, to determine the complete sequence of a nucleic acid target, the 
set of probes need not contain every possible combination of nucleotides of the random 
sequence to be encompassed by the method of this invention. This variation of the 



invention is based on the theory of degenerated probes proposed by S.C. Macevicz 
(International Patent Application, US89-04741, published 1989, and herein specifically 
incorporated by reference). The probes are divided into four subsets. In each, one of 
the four bases is used at a defined number of positions and all other bases except that 
one on the remaining positions. Probes fi-om the first subset contain two elements, A 
and non-A (A = adenosine). For a nucleic acid sequence of length k, there are 4(2" - 
1), instead of 4" probes. Where k = 8, a set of probes would consist of only 1020 
different members instead of the entire set of 65,536. The savings in time and expense 
would be considerable. In addition, it is also a method of the present invention to 
utilize probes wherein the random nucleotide sequence contains gapped segments, or 
positions along the random sequence which will base pair with any nucleotide or at least 
not interfere with adjacent base pairing. 

Hybridization between complimentary bases of DNA, RNA, PNA, or 
combinations of DNA, RNA and PNA, occurs under a wide variety of conditions such 
as variations in temperature, salt concentration, electrostatic strength, and buffer 
composition. Examples of these conditions and methods for applying them are 
described in Nucleic Acid Hybridization: A Practical Approach (B.D. Hames and S.J. 
Higgins, editors, IRL Press, 1985), which is herein specifically incorporated by reference. 
It is preferred that hybridization takes place between about 0°C and about TO-C, for 
periods of from about 5 minutes to hours, depending on the nature of the sequence to 
be hybridized and its length. For example, typical hybridization conditions for a mixmre 
of two 20-mers is to bring the mixture to 68°C and let cool to room temperature (22»C) 
for five minutes or at very low temperatures such as 2 °C in 2 microliters. It is also 
preferred that hybridization between nucleic acids be facilitated using buffers such as 
saline, Tris-EDTA (TE), Tris-HCl and other aqueous solutions, cenain reagents and 
chemicals. Preferred examples of these reagents include single-stranded bindina 
proteins such as Rec A protein, T4 gene 32 protein, E. coli single-stranded binding 
protein, and major or minor nucleic acid groove binding proteins. Preferred examples 
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of other reagents and chemicals include divalent ions, polyvalent ions, and intercalating 
substances such as ethidium bromide, actinomycin D, psoralen, and angelicin. 

The nucleotide sequence of the random ponion of each probe is 
determinable by methods which are weU-known in the art. Two methods for 
determining the sequence of the nucleic acid probe are by chemical cleavage, as 
disclosed by Maxam and Gilbert (1977), and by chain extension using ddNTPs, as 
disclosed by Sanger et al. (1977), both of which are herein specifically incorporated by 
reference. Alternatively, another method for determining the nucleotide sequence of 
a probe is to individually synthesize each member of a probe set. The entire set would 
comprise every possible sequence within .the random ponion or some smaUer portion 
of the set. The method of the present invention could then be conducted with each 
member of the set. Another procedure would be to synthesize one or more sets of 
nucleic acid probes simultaneously on a solid support. Preferred examples of a solid 
suppon include a plastic, a ceramic, a metal, a resin, a gel, and a membrane. A more 
preferred embodiment comprises a two-dimensional or three-dimensional matrix, such 
as a gel, with multiple probe binding sites, such as a hybridization chip as described by 
Pevzner et al. (J. BiomoL Struc. & Dyn. 9:399-410, 1991), and by Maskos and Southern 
(Nuc. Acids Res. 20:1679-84, 1992), both of which are herein specifically incorporated 
by reference. Nucleic acids are bound to the soUd support by covalent binding such as 
by conjugation with a coupling agent, or by non-covalent binding such as an electrostatic 
interaction or antibody-antigen coupling. Typical coupling agents include biotin/ 
streptavidin, Staphylococcus aureus protein A/IgG antibody fi-agment, and 
streptavidin/protein A chimeras (TSano and C.R. Cantor, Bio/Technology 9:1378-81, 
1991). 

Hybridization chips can be used to construct very large probe arrays which 
are subsequemly hybridized with a target nucleic acid. Analysis of the hybridization 
pattern of the chip provides an immediate fingerprint identification of the target 
nucleotide sequence. Patterns can be manuaUy or computer analyzed, but it is clear that 
positional sequencing by hybridization lends itself to computer analysis and automation. 
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Mgcntbms and software have been developed for sequence 
appUcable .o the methods described herein (R dI 

5:1085-1,02, 1991- P A Pev™. , I " • S'™'^- & Dyn 

.ere. spe.::^:™r, ^ 

Whereby one of ordinary slci, can incorporal de eUl^ 
-nple, eneyn^es .ed h, n,o,ecu,ar bio JT^t 

substrate into nucleic acid TT,e<, ■ , , ^ ""'^"^e radioisotope labeled 
-elin. isotope is prell??;;':" ^ ~.es. ..e 

— - ^ 

advanced n.eth«b of detecdon Z " ""^ '^"">-^'^- Other, more 

~ce Of thin .net;r a,: :h~ T °^ P'^n 

»idb,Pha™acia,orothersui.able ^eZ i !■ ^'^^ ' 

»d the targe, nucleic acid detected 77 P"^^ ™ay be labeled | 

"iU. .he labeled probeT^ i """""^ 

•oasohdsuppor: Kro. Il^t^r::!-: "^-^-^ 

biological san-ple containing nude^ -rth:. ^ ' 

g nucleic acid, the target nucleic add is identified 

detenniningaCelltrr ' "'^'"'^ » -''O^' ^ 

acid With a fh.. ria^ ::r: "-^^'-^ '^^^^^ .^e nucleic 

second detectable labdt ^ ^ ''"^'^^ a . 

Ponions Of the nuc et L " ^ '"'"'^^ "--"e sequences of . 
Ponions to the nucleic a^d oT"' "«= -'--"'P of the nucleotide sequence 
detectable label Z ■ " <'"'«^'"= '^^el and the second 

detennnung the nucleotide sequence of the nucleic acid. 
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Fragments of target nucleic acids labeled both terminally and internally can be 
distinguished based on the relative amounts of each label within respective fragments. 
Fragments of a target nucleic acid terminally labeled with a first detectable label will 
have the same amount of label as fragments which include the labeled terminus. 
However, theses fragments will have variable amounts of the internal label directly 
proportional to their size and distance for the terminus. By comparing the relative 
amount of the first label to the relative amount of the second label in each fragment^ 
one of ordinary skill is able to determine the position of the fragment or the position 
of the nucleotide sequence of that fragment within the whole nucleic acid. 

Another embodiment of the invention is directed to methods for 
determining a nucleotide sequence by hybridization comprising the steps of creating a 
set of nucleic acid probes wherein each probe has a double-stranded portion, a single- 
stranded portion, and a random sequence within the single-stranded portion which is 
determinable, hybridizing a nucleic acid target which is at least party single-stranded to 
the set, ligating the hybridized target to the probe, and determining the nucleic sequence 
of the target which is hybridized to the single-stranded portion of any probe. This 
embodiment adds a step wherein the hybridized target is ligated to the probe. Ligation 
of the target nucleic acid to the complimentary probe increases fidelity of hybridization 
and allows for incorrectly hybridized target to be easily washed from correctly hybridized 
target (Figure 11). More importantly, the addition of a ligation step allows for 
hybridiztions to be performed under a single set of hybridization conditions. For 
example, hybridization temperature is preferably between about ll-STOQ the salt 
concentration useful is preferably between about 0.05-0.5M, and the period of 
hybridization is between about 1-14 hoiu^s. This is not possible using the methodoligies 
of the current procedures which do not employ a ligation step and represents a very 
substantial improvement. Ligation can be accomplished using a eukaryotic derived or 
a prokaryotic derived ligase. Preferred is T4 DNA or RNA ligase. Methods for use of 
these and other nucleic acid modifying enzymes are described in Current Protocols in 
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Molecular Biology (F.M. Ausubel et al., editors, John Wiley & Sons, 1989), which is 
herein specifically incorporated by reference. 

There are a number of distinct advantages to the incorporation of a 
ligation step. First and foremost is that one can use identical hybridization conditions 

5 for hybridization. Variation of hybridization conditions due to base composition are no j 

longer relevant as nucleic adds with high A/T or G/C content ligate with equal 
efficiency. Consequently, discrimination is very high between matches and mis-matches, 
much higher than has been achieved using other methodologies such as Southern (1989) 
wherein the effects of G/C content were only somewhat neutralized in high 

10 concentrations of quartemary or tertiary amines (e.g., 3M tetramethyl ammonium jo 

chloride in Drmanac et al., 1993). i 

Another embodiment of the invention is directed to methods for 
determining a nucleotide sequence by hybridization which comprises the steps of 
creating a set of nucleic acid probes wherein each probe has a double-stranded portion, | 

15 a single-stranded portion, and a random sequence within the single-stranded portion 

which is determinable, hybridizing a target nucleic acid which is at least partly single- 
stranded to the set of nucleic acid probes, enzymatically extending a strand of the probe 
using the hybridized target as a template, and determining the nucleotide sequence of 
the single-stranded portion of the target nucleic acid. This embodiment of the invention 

20 is similar to the previous embodiment, as broadly described herein, and includes all of 

the aspects and advantages described therein. An alternative embodiment also includes 
a step wherein hybridized target is ligated to the probe. Ligation increases the fidelity 
of the hybridization and allows for a more stringent wash step wherein incorrectly 
hybridized, unligated target can be removed and further, allows for a single set of 

25 hybridization conditions to be employed. Most nonligation techniques including 

Southern (1989), Drmanac et al. (1993), and Khrapko et al. (1989 and 1991), are only 
accurate, and only marginally so, when hybriizations are performed under optimal 
conditions which vary with the G/C content of each interaction. Preferable condiions 
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comprise a hybridization temperature of between about 22-37^0C, a salt concentration 
of betwen about 0.05-0.5 M, and a hybridization period of between about 1-14 hours. 

Hybridization produces either a 5' overhang or a 3' overhang of target 
nucleic acid. Where there is a 5' overhang, a 3- hydroxyl is available on one strand of 
the probe from which nucleotide addition can be initiated. Preferred enzymes for this 
process include eukaryotic or prokaryotic polymerases such as T3 or T7 polymerase, 
Klenow fragment, or Taq polymerase. Each of these enzymes are readily available to 
those of ordinary skill in the art as are procedures for their use {Current Protocols in 
Molecular Biology). 

Hybridized probes may also be enzymatically extended a predetermined 
length. For example, reaction condition can be established wherein a single dNTP or 
ddNTP is utilized as substrate. Only hybridized probes wherein the first nucleotide to 
be incorporated is complimentary to the target sequence will be extended, thus, 
providing additional hybridization fidelity and additional information regarding the 
nucleotide sequence of the target. Sanger (1977) or Maxam and Gilbert (1977) 
sequencing can be performed which would provide further target sequence data. 
Alternatively, hybridization of target to probe can produces 3' extensions of target 
nucleic adds. Hybridized probes can be extended using nucleoside biphosphate 
substrates or short sequences which are ligated to the 5' terminus. 

Another embodiment of the invention is directed to a method for 
determining a nucleotide sequence of a target by hybridization comprising the steps of 
creating a set of nucleic add probes wherein each probe has a double-stranded portion, 
a single-stranded portion, and a random nucleotide sequence within the single-stranded 
ponion which is determinable, cleaving a plurality of nucleic add targets to form 
fragments of various lengths which are at least partly single-stranded, hybridizing the 
single-stranded region of the fragments with the single-stranded region of the probes, 
identifying the nudeotide sequences of the hybridized portions of the fragments, and 
comparing the identified nudeotide sequences to determine the nudeotide sequence of 
the target. An alternative embodiment includes a ftirther step wherein the hybridized 



wo 94/11530 



PCT/US93/I0616 



-18- 



fragmenis are Ugated to the probes prior to identifying the nucleotide sequences of the 
hybridized ponions of the fragments. As described heerin, the addition of a ligation 
step allows for hybridizations to be performed under a single set of hybridization 
conditions. 

In these embodiments, target nucleic acid is partially cleaved forming a 
plurality of nucleic acid fragments of various lengths, a nested set, which is then 
hybridized to the probe. It is preferred that cleavage occurs by enzymatic, chemical or 
physical means. Preferred enzymes for panial cleavage are exonuclease HI, SI nuclease, 
DNase I, Bal 31, mung bean nuclease, PI nuclease, lambda exonuclease, restriction 
endonuclease, and RNase I. Preferred means for chemical cleavage are ultraviolet light 
induced cleavage, ethidium bromide induced cleavage, and cleavage induced with acid 
or base. Preferred means for mechanical cleavage are shearing through direct agitation 
such as vortexing or multiple cycles of freeze-thawing. Procedures for enzymatic, 
chemical or physical cleavage are disclosed in, for example. Molecular Cloning: A 
Laboratory Manual (T. Maniatis et al., editors, Cold Spring Harbor 1989), which is 
herein specifically incorporated by reference. 

Fragmented target nucleic acids will have a distribution of terminal 
sequences which is sufficiently broad so that the nucleotide sequence of the hybridized 
fragments will include the entire sequence of the target nucleic acid. A preferred 
method is wherein the set of nucleic add probes is fixed to a solid support. A preferred 
solid support is a plastic, a ceramic, a metal or magnetic substance, a resin, a film or 
other polymer, a gel, or a membrane, and it is more preferred that the solid support be 
a two-dimensional or three-dimensional matrix with multiple probe binding sites such 
as a hybridization chip as described by BLR. Khrapko et al. (J. DNA Sequencing and 
Mapping 1:357-88, 1991). It is also preferred wherein the target nucleic acid has a 
detectable label such as a radioisotope, a stable isotope, an enzyme, a fluorescent 
chemical, a luminescent chemical, a chromatic chemical, a metal, an electric charge, or 
a spatial structure. 
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As an extension of this procedure, it is also possible to use the methods 
herein described to determine the nucleotide sequence of one or more probes which 
hybridize with an unknown target sequence. For example, fragmented targets could be 
terminally or internally labeled, hybridized with a set of nucleic acid probes, and the 
hybridized sequences of the probes determined. This aspect may be useful when it is 
cumbersome to determine the sequence of the entire target and only a smaller region 
of that sequence is of interest. 

Another embodiment of the invention is directed a method wherein the 
target nucleic acid has a first detectable label at a terminal site and a second detectable 
label at an internal site. The labels may be the same type of label or of different types 
as long as each can be discriminated, preferably by the same detection method. It is 
preferred that the first and second detectable labels are chromatic or fluorescent 
chemicals or molecules which are detectable by mass spectrometry. Using a double- 
labeling method coupled with analysis by mass spectrometry provides a very rapid and 
accurate sequencing methodology that can be incorporated in sequencing by 
hybridization and lends itself very well to automation and computer control. 

Another embodiment of the invention is directed to methods for creating 
a nucleic acid probe comprising the steps of synthesizing a plurality of single-stranded 
first nucleic acids and an array of longer single-stranded second nucleic acids 
complimentary to the first nucleic acid with a random terminal nucleotide sequence, 
hybridizing the first nucleic acids to the second nucleic acids to form hybrids having a 
double-stranded portion and a single-stranded portion with the random nucleotide 
sequence within the single-stranded portion, hybridizing a single-stranded nucleic acid 
target to the hybrids, ligating the hybridized target to the first nucleic acid of the hybrid, 
isolating the second nucleic acid, and hybridizing the first nucleic acid of step with the 
isolated second nucleic acid to form a nucleic acid probe. Probes created in this maimer 
are referred to herein as customized probes. 

Preferred customized probe comprises a first nucleic acid which is about 
15-25 nucleotides in length and the second nucleic acid is about 20-30 nucleotides in 
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length. It is also preferred that the double-stranded portion contain an enzyme 
recognition site which allows for increased flexibility of use and facilitates cloning 
should it at some point become desirable to clone one or more of the probes. It is also 
preferred if the customized probe is fixed to a solid support, such as. a plastic a 
ceramic, a metal, a resin, a film or other polymer, a gel, or a membrane, or possibly a 
two- or three-dimensional array such as a chip or microchip. 

Customized probes, created by the method of this invention, have a wide 
range of uses. Hiese probes are, first of all. strucmrally useful for identifying and 
bmdmg to only those sequences which are homologous to the overhangs. Secondly, the 
overhangs of these probes possess the nucleotide sequence of interest. No fui^her 
manipulation is required to carry the sequence of interest to another structure 
nierefore. the customized probes greatly lend themselves to use in, for example 
diagnostic aids for the genetic screening of a biological sample. 

Another embodiment of the invention is directed to arrays of nucleic acid 
probes wherein each probe comprises a double-stranded portion of length D, a terminal 
smgle-stranded portion of length S. and a random nucleotide sequence within the single- 
stranded ponion of length R. Preferably, D is between about 3-20 nucleotides and S 
IS between about 3-20 nucleotides and the entire array is fixed to a solid support which 
may be composed of plastics, ceramics, metals, resins, polymers and other films gels 
membranes and two-dimensional and three-dimensional matrices such as hybridization 
chips or microchips. Probe arrays are useful in sequencing and diagnostic applications 
when the sequence and/or position on a solid support of eveiy probe of the array is 
known or is miknown. In either case, information about the target nucleic acid may be 
obtamed and the target nucleic acid detected, identified and sequenced as described in 
the methods described herein. Arrays comprise 4« differem probes representing every 
member of the random sequence of length R. but arrays of less than are also 
encompassed by the invention. 

Another embodimem of 'the invention is directed to method for creating 
probe arrays comprising the steps of synthesizing a first set of nucleic acids each 
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comprising a constant sequence of length C at a S'-terminus and a random sequence of 
length R at a S'-terminus. synthesizing a second set of nucleic acids each comprising a 
sequence complimentaty to the constant sequence of each of the first nucleic acid, and 
hybridizing the first set with the second set to create the array. Preferably, the nucleic 
acids of the first set are each between about 15-30 nucleotides in length and the nucleic 
acids of the second set are each between about 10-25 nucleotides in length. Also 
preferable is that C is between about 7-20 nucleotides and R is between about 3-10 
nucleotides. 

Arrays may comprise about 4" different probes, but in cenain applications, 
an entire array of every possible sequence is not necessary and incomplete arrays are 
acceptable for use. For example, incomplete arrays may be utilized for screening 
procedures of very rare target nucleic acids where nonspecific hybridization is not 
expected to be problematic. Further, every member of an array may not be needed 
when detecting or sequencing smaUer nucleic adds where the chance of requiring 
cenain combinations of nucleotides is so low as to be practically nonexistent. Array 
which are fixed to solid supports are expected to be most usefiil, although array in 
solution also have many applications. Solid supports which are useful include plastics 
such as microliter plates, beads and microbeads, ceramics, metals where resilience is 
desired or magnetic beads for ease of isolation, resins, gels, polymers and other fihns 
membranes or chips such as the two- and three-dimensional sequencing chips utilized 
in sequencing technology. 

Alternatively, probe arrays may also be made which are single-stranded 
niese arrays are created, preferably on a solid support, basically as described, by 
synthesizing an array of nucleic acids each comprising a constant sequence of length C 
at a 3'-terminus and a random sequence of length R at a 5'-terminus, and fixing the 
array to a first solid support. Arrays created in this mamier can be quickly and easily 
transformed into double-stranded arrays by the synthesis and hybridization of a set of 
nucleic adds with a sequence compUmentary to the constant sequence of the replicated 
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array to create a double-stranded replicated array. However, in their present form, 
single-stranded arrays are very valuable as templates for replication of the array. 

Due to the very large numbers of probes which comprise most useful 
arrays, there is a great deal of time spent in simply creating the array. It requires many 
hours of nucleic acid synthesis to create each member of the array and many hours of 
manipulations to place the array in an organized fashion onto any solid support such as 
those described previously. Once the master array is created, replicated arrays or slaves, 
can be quickly and easily created by the methods of the invention which take advantage 
of the speed and accuracy of nucleic acid polymerases. Basically, methods for 
replicating an array of single-stranded probes on a solid support comprise the steps of 
synthesizing an array of nucleic acids each comprising a constant sequence of length C 
at a 3'-terminus and a random sequence of length R at a 5'-terminus, fixing the array 
to a first solid support, synthesizing a set of nucleic acids each comprising a sequence 
complimentary to the constant sequence, hybridizing the nucleic acids of the set with the 
array, enzymatically extending the nucleic acids of the set using the random sequences 
of the array as templates, denaturing the set of extended nucleic acids, and fixing the 
denatured nucleic acids of the set to a second solid support to create the replicated 
array of single-stranded probes. 

Denaturation of the array can be performed by subjecting the array to 
heat, for example 90°-100*C for 2-15 minutes, or highly alkaUne conditions, such as by 
the addition of sodium hydroxide. Denaturation can also be accomplished by adding 
organic solvents, nucleic acid binding proteins or enzymes which promote denaturation 
to the array. Preferably, the solid supports are coated with a substance such as 
streptavidin and the nucleic acid reagents conjugated with biotin. Denaturation of the 
panial duplex leads to binding of the nucleic acids to the solid support. 

Another embodiment of the invention is directed to methods for creating 
arrays of probes comprising the steps of synthesizing an array of single-stranded nucleic 
acids each containing a constant sequence at the 3'-terminus, another constant sequence 
at the 5'-terminus, and a random internal sequence of length R flanked by the cleavage 
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site(s) of a restriction enzyme (on one or both sides), synthesizing an array of primers 
each compliementary to a portion of the constant sequence of the 3'-tenninus, 
hybridizing the two arrays together to fonn hybrids, extending the sequence of each 
primer by polymerization using a sequence of the nucleic acid as a template, and 
cleaving the extended hybrids with the restriction enzyme to form an array of probes 
with a double-stranded portion at one tenninus, a single-stranded portion containing the 
random sequence at the opposite terminus. Preferably, the nucleic acids are each 
between about 10-50 nucleotides in length and R is between about 3-5 nucleotides in 
length. Any of the restriction enzymes which produce a 3'- or 5'-overhang after cleavage 
are suitable for use to make the anray. Some of the restriaion enzymes which are useful 
in this regard, and their recognition sequences are depicted in Table 1. 

Table 1 



Restriction 
Enzyme 
AlwNI 



Bbvl 

Bgll 

BstXI 

Dra III 

Fokl 

Hgal 

PflM I 



Recognition Seouence 
5'-Overhang 3'-Overhang 

5'-CAG nnn;ctg 

3'-GTCtNNN GAC 

5'-GCAGC(N)gi 
3'-CGTCG(N),2t 



5'-GCCN NNNiNGGC 
3'-CGGNtNNN NCCG 



5'-CCAN. nnnn;ntgg 
3'-GGTNtNNNN NACC 



5'-CAC NNNiGTG 
3'-GTGtNNNCAC 

5'-GGATG(N),i 
3'.CCTAC(N),3t 

5'-GACGC(N)jl 
3'-CTGCG(N),ot 



5'-CCAN NNNINTGG 
3'-GGTNtNNN NACC 
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'f^' 5'.GCATC(N),, 

3'-CGTAG(N),t 



5'.GGCCN NNNiNGGCC 
3-CCGGNfNNN NCCGG 
Also prefered is that the array be fixed tn , j 
ceramic metal, resi., pCymer gel ' ^ ^"PP"" ^^""^ a plastic, 

accomplished by cot^lL Je / . " '^^P' - "e 

0^ Jri: ~ r::r - " ^-^^^ 
- -~ ./.o.em - : — r a^ ^ 

Alternatively, another similar method for creatin. , 

comprising the steps of synthesizing an array of sinJ 7 ' """^ 

containing a constant sequence at the a^e™ each 

a -omle Js::::~-— ^^^^^ ^' - 5- 

:::::r;:T^rc:\^^^-'-^^^^^^ 

yiunemaiy to the constant sequence at the v • 
»ays together to form hybrids, enzymatiluy ^nl le 

acidsastemplatestoform^engthhybridjlo^';:^::?.^^ 

such as plasmids or ohaae H.n- u t^-length hybnds into vectors 

-olating the clone p^n^ ^"^ ? ^^""^^ " 

polymerase chain reaoiL^d d^"- ^ "~ 

^--etoformthe^a rfr^rLtr:^^^^^^^^^ "^-^ — 

con«^ the r^iz^^r::::::''^'^-' 

Usmg this method the array of probes may have 5'. or 3 1 V ! 
cleavage specificity of the restriction enzyme (e g Tal i """''"^ °" 

be fixed to a solid support such as a pZ ertj ^^""^ P""" 

membranes and chip Pref„,h, . ' '^"^ P°'>™er. fihn, gel, 

cmp. Preferably, durmg PGR amplification th. „ . 
conjugated with biotin which f»riK,„ Plication, the reagem primers are 

wh.ch faahtates eventual binding to a streptavidin coated surface. 
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Another embodiment of the invemion is directed to methods for using 
customized probes, arrays, and replicated arrays, as described herein, in diagnostic aids 
to screen biological samples for specific nucleic acid sequences. Diagnostic aids and 
methods for using diagnostic aids would be very useful when sequence information at 
a particular locus of, for example, DNA is desired. Single nucleotide mutations or more 
complex nucleic acid fingerprints can be identified and analyzed quickly, efficiently, and 
easily. Such an approach would be immediately useful for the detection of individual 
and family genetic variation, of inherited mutations such as those which cause a disease, 
DNA dependent normal phenotypic variation, DNA dependent somatic variation, and 
the presence of heterologous nucleic acid sequences. 

Especially useful are diagnostic aids comprising probe arrays. These 
arrays can make the detection identification, and sequencing of nucleic acids from 
biological samples exceptionally rapid and aUows one to obtain multiple pieces of 
information fi-om a single sample after performing a single test. Methods for deteaing 
and/or identifying a target nucleic acid in a biological sample comprise the steps of 
creating an array of probes fixed to a soUd support as described herein, labeling the 
nucleic add of the biological sample with a detectable label, hybridizing the labeled 
nucleic add to the array and detecting the sequence of the nudeic add fi-om a binding 
pattern of the label on the array. These methods for creating probe arrays and for 
rapidly and effidently replicating those arrays, such as for diagnostic aids, makes the 
manufacture and commerdal application of large numbers of arrays a possibility. 

As described, these diagnostic aids are usefiil to humans, other animals, 
and even plants for the detection of infections due to viruses, bacteria, ftmgi or yeast, 
and for the detection of certain parasites. These deteaion methods and aids are also 
useful in the feed and food industries and in the environmental field for the detection, 
idemification and sequencing of nudeic adds assodated with samples obtained fi-om 
environmental sources and from manufacturing products and by-products. 

Diagnostic aids comprise specific nudeic add probes fixed to a solid 
support to which is added the biological sample. Hybridization of target nucleic adds 
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is determined by adding a detectable label, such as a labeled antibody, which will 
specifically recognize only hybridized targets or, alternatively, unhybridized target is 
washed off and labeled target specific antibodies are added. In either case, appearance 
of label on the solid support indicates the presence of nucleic acid target hybridized to 
the probe and consequently, within the biological sample. 

Customized probes may also prove useful in prophylaxis or therapy by 
directing a drug, antigen, or other substance to a nucleic acid target with which it will 
hybridize. The substance to be targeted can be bound to the probe so as not to 
interfere with possible hybridization. For example, if the probe was targeted to a viral 
nucleic acid target, an effective antiviral could be bound to the probe which will then 
be able to specifically cany the antiviral to infected cells. This would be especiaUy 
useful when the treatment is harmful to normal cells and precise targeting is required 
for efficacy. 

Another embodiment of the invention is directed to methods for creating 
a nucleic acid probe comprising the steps of synthesizing a pluraUty of single-stranded 
first nucleic acids and an array of longer single-stranded second nucleic acids 
complimentary to the first nucleic acid with a random terminal nucleotide sequence, 
hybridizing the first nucleic acids to the second nucleic adds to form hybrids having a 
double-stranded portion and a single-stranded portion with the random nucleotide 
sequence within the single-stranded portion, hybridizing a single-stranded nucleic acid 
target to the hybrids, ligating the hybridized target to the first nucleic acid of the hybrid, 
hybridizing the Ugated hybrid with an array of oligonucleotides with random nucleotide 
sequences, ligating the hybridized oUgonucleotide to the second nucleic add of the 
ligated hybrid, isolating the second nudeic add, and hybridizing another first nudeic 
add with the isolated second nudeic add to form a nudeic add probe. Preferred is that 
the first nudeic add is about 15-25 nudeotides in length, that the second nudeic acid 
is about 20-30 nudeotides in length, that the constant portion contain an enzyme 
recognition site, and that the oligonucleotides are each about 4-20 nudeotides in length. 
Probes may be fixed to a solid support such as a plastic, ceramic, a metal, a resin, a gel, 
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or a membrane. It is preferred that the solid support be a two-dimensional or three- 
dimensional matrix with multiple probe binding sites such as a hybridization chip 
Nucleic acid probes created by the method of the present invention are useful in a 
diagnostic aid to screen a biological sample for genetic variations of nucleic acid 
sequences therein. 

Anoiher embodiment of the invention is directed to a method for creating 
a nucleic add probe comprising the steps of (a) synthesizing a pluraUty of single- 
stranded first nudeic acids and a set of longer single-stranded second nucleic acids 
comphmentaty to the first nucleic add «i,h a random terminal nucleotide sequence (b) 
hybndtzing the firs, nucleic adds to the second nucleic adds to fonn hybrids having a 
double-stranded portion and a smgle-stranded pordon with the random nudeotide 
sequence in the single-stranded portion, (c) hybridizing a single-stranded mideic add 
target to the hybhds, (d) Ugatiug the hybridized tatget to the firs, nudeic add of the 
hybnd, (e) enzymaticaUy extending the second mideic add using fl,e target as a 
template, (f) isolating the extended second nudeic add, and (g) hybridizing the firs, 
nudetc add of step (a) with the isolated second nudeic add to form a nudeic add 
probe. It .s preferred that the first nudeic add is about 15-25 nudeotides in length that 
the second nudeic add is about 20-30 nudeoUdes in length, and that the double- 
stranded portion contam an enzyme recognition site. It is also preferred that the probe 
be fixed to a solid support, sud> as a plastic, ceramic, a metal, a resin, a gel or a 
membrane. A preferred solid support is a ftvo-dimensional or three-dimensional matrix 
with multiple probe binding sites, such as a hybridization chip. A further embodimem 
of the presem invention is a diagnostic aid compdsing the created nudeic add probe 
and a method for using the diagnostic aid to saeen a biological sample as herein 
described. 

As an extension of this procedure, i, is also possible to use ,he me,hods 
herem described to detennine the nucleotide sequence of one or more probes which 
hybndize with an unknown tatget sequence. For example, Sanger dideoxynudeotide 
sequencmg techniques could be used when enzymadcally extending the second nudeic 
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acd u.„,g .he .arge. as a .empla« and labeled substrate, extended products could h 
resolved by polyactylanude gel electrophoresis, and the hybridised sequences of t e' 
probes easUy read off the gel. TTls aspect ntay be useft. when it is cun.be,so„e 
de^esequenceoftheenuretarget.^ 

shouM , K "^"'^ '-bodimenu of the invention, bu, 

Should not be vtewed as limiting the scope of the invention. 

Examp les 
Example 1 

,„ Manipulation of mA in thn^nlirt., I . Complexes between streptavidin 

or av,dm) and biotin represent the standard way in which much solid state D^I 
»q«n«ng or other DNA manipulation is done, and one of the standard ways in whict 

b otm technology has expanded in several ways. Several years ago. the gene for 

streptavidin was cloned and seauenceri^rp a , « • 'or 

IQIKI V. sequenced (C.E. Argaiana et al.. Nuc. Adds Res. 14:1871. 

986). More recently, usmg the Studier T7 system, over-expre^ion of a,e Protein in £ 
. was achteved (T. Sano and CI. Cantor. Proc. NaU. Acad. Sci. USA 87:14, 1,90). 

IZ' """'"^ -""""•^ P'OP*"*- - 

mnlTlT'" ^ ™ C.R. cantor, 

ao/rechnology 9:1378-81. 1„3). Tie most relevant of these is core streptavidin. m 
act ve protem wtth extraneous N- and Cenninal peptides removed) 1 5 ^ ein 
r^uiuesatuchedtotheC-terminus. An active protein fi,sion of streptavidin to roTo , 
tadmg domams Of staphylococcal A protein was also produced (T. Sano and CR. 
^t or B.o/Tectoology 9:1378-81. 1991). TOs allowed biotinylated DNAs to be - 
a tached to specfic Immunoglobulin G molecules without the need for any covalen. ! 
chenus^, and „ has led to the development of immuno-PCR, an exceeding! sensitive ' 
method for detectmg antigens (T. Sano et al., Sci. 258:120-29, 1992). 
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A protein fusion between streptavidin and metaUothionein was recently 
onstnicted (T. Sano et al., Proc. Natl. Acad. Sci. USA, 1992). Both panners in this 
protein fusion are fuUy active and these streptavidin-biotin interactions are being used 
to develop new methods for purification of DNA, including triplex-mediated capture of 
duplex DNA on magnetic microbeads (T. Ito et al., Proc. Natl. Acad. Sci. USA 89:495- 
98, 1992) and affinity capture electrophoresis of DNA in agarose (T Ito et al G A.T A. 
1992). ' ■ ' 

An examination of the potential advantages of stacking hybridization has 
been carried out by both calculations and pilot experiments. Some calculated T 's for 
perfect and mismatched duplexes are shown in Figure 1. These are based on average 
base compositions. The calculations were preformed using the equations given by J.G. 
Wetmur (Crit. Rev. in Biochem. and Mol. Biol. 26:227-59, 1991). In the case of 
oligonucleotide stacking, these researchers assumed that the first duplex is fully formed 
under the conditions where the second oligomer is being tested; in practice this may not 
always be the case. It will, however, be the case for the configuration shown in Figure 
1. The calculations reveal a number of interesting features about stacking hybridization 
Note that the binding of a second oligomer next to a pre-formed duplex provides an 
extra stability equal to about two base pairs. More interesting, still, is the fact that 
nnspairing seems to have a larger consequence on stacking hybridization than it does 
on ordinaiy hybridization. Hiis is consistent with the very large effects seen by KR. 
Khrapko et al. (J. DNA Sequencing and Mapping 1:375-88, 1991) for certain types of 
mispairing. Other types of mispairing are less destabilizing, but these can be eliminated 
by requiring a ligation step. In standard SBH. a terminal mismatch is the least 
destabilizing event, and thus, leads to the greatest source of ambiguity or background. 
For an octanucleotide complex, an average terminal mismatch leads to a 6«C lowering 
in T„. For stacking hybridization, a terminal mismatch on the side away from the pre- 
existing duplex, is the least destabilizing event. For a pentamer, this leads to a drop in 
T„ of 10»C. These considerations indicate that the discrimination power of stacking 
hybridization in favor of perfect duplexes might be greater than ordinary SBH. 
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Example 2 



Terminal segnencinp hy p^c-w^n^, ^ ■ . 

it uses a duplex oIi,onucleotL array ^Jl:, ^7 
^ 1 . ^ -ended single-stranded overhanac xu 

up'« person o, eac. DNA she™ . Oo.. ..e ov..^ ^ ^,"1= 

pnnaple an array of 4° mnh^<i ,c r,««^ ^ . 

.he p^fonned DNA duplex and *e new,, LZ^ """^"^ 

ue Qupiex are also important variables. InitiaUy one 5' enH w . . 

unlabeled DNA mav r. u ^ ^ mponant where 

detect thP ^' to • , ^ the 3 overhang of the array can 

imen,a,. In some subsequen. examples ,t "'^^•«'"««"'^'oi..or,o.aUy ' 
absolutely specific fo. Ja- end 

^.andedo.~:re=X!e^^^^^^^ j 
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because it does not allow for the use of polymerases to enhance the length and accuracy 
of the sequence read. 

Example 3 

Preparation of model arrays. Following the scheme shown in Figure 2, in 
a single synthesis, all 1024 possible single-stranded probes with a constant 18 base stalk 
followed by a variable 5 base extension can be created. The 18 base extension is 
designed to contain two restriction enzyme cutting sites. Hga I generates a 5 base, 5' 
overhang consisting of the variable bases Nj. Not I generates a 4 base, 5' overhang at 
the constant end of the oUgonucleotide. The synthetic 23-mer mixture will be hybridized 
with a compUmentaiy 18-mer to form a duplex which can then be enzymaticaUy 
extended to form all 1024, 23-mer duplexes. These can be cloned by, for example, blunt 
end ligation, into a plasmid which lacks Not I sites. Colonies containing the cloned 23- 
base insert can be selected. Each should be a clone of one unique sequence. DNA 
minipreps can be cut at the constant end of the stalk, filled in with biotinylated 
pyrimidines, then cut at the variable end of the stalk, to generate the 5 base 5' overhang. 
The resulting nucleic acid can be fractionated by Qiagen columns (nucleic acid 
purification columns) to discard the high molecular weight material, and the nucleic acid 
probe will then be attached to a streptavidin-coated surface. This procedure could 
easily be automated in a Beckman Biomec or equivalent chemical robot to produce 
many identical arrays of probes. 

The initial array contains about a thousand probes. The particular 
sequence at any location in the array will not be known. However, the array can be 
used for statistical evaluation of the signal to noise ratio and the sequence 
discrimination for different target molecules under different hybridization conditions. 
Hybridization with known nucleic acid sequences allows for the identification of 
particular elements of the array. A sufficient set of hybridizations would train the array 
for any subsequent sequencing task. Arrays are partially characterized until they have 
the desired properties. For example, the length of the oligonucleotide duplex, the mode 
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of .ts attachment to a surface, and the hybridization conditions used, can all be varied 
usmg the initial set of cloned DNA probes. Once ti,e son of a^ay tiiat works best ^ 
detemuned, a complete and fully characterized array can ti,en be constructed bv- 
ordinary chemical synthesis. 

Example 4 

. Preparation of specific probe array, . The major challenge for positional 
SBH, ,s to build real arrays of probes, and test ti,e fraction of sequences that actually 
perform according to expectations. Base composition and base sequence dependence 
on the effectiveness of hybridization is .probably the greatest obstacle to successful 
implementation of these metitods. He use of enzymatic steps, where feasible may 
smtphfy these problems, since, after all, the enzymes do manage to work wi± a wide 
vaneiy of DNA sequences m vko. WiU. positional SBH, one potential trick to 
compensate for some variations in stability would be to aUow the adjacent duplex to 
vary. Hus, for an A+T rich overhang, one could use a G+C rich sticking duplex, and 
wee versa. 

Four metiiods for making arrays are tested and evaluated with two major 
objecnves. n,e firs, is to produce, rapidly and inexpensively, arrays ti.at will test some 
of the principles of positional SBH. He second is to develop effective meti,ods for the , 
automated preparation of full arrays needed for production sequencing via positional j 
SBH. Smce the fint smdies indicated titat a five base overhang will be sufficient, anays ' 
may only have to have 1024 members. He cost of making aU of ti,ese compounds is 
actually quite modest. He constant ponion of tite probes can be made once, and ti>e„ 
extended m paraUel. by automated DNA synthesis methods. In tiae simplest case, this 
W.U require the addition of only 5 bases to each of 1024 compounds, which at typic^ 
chemical costs of $2 per base will amount to a total of about $10,000. 

Moderately dense arrays can be made using a typical x-y robot to spot the 
biounylated compounds individually onto a streptavidin-coated surface. Using such 
robots, H is possible to make arrays of 2 x W samples in 100 to 400 cm= of nominal 
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surface. T array should preferably fit in 10 cm^ but even if forced, for unforeseen 
technical reasons, to compromise on an array ten times or even 50 times less dense, it 
will be quite suitable for testing the principles of and many of the variations on 
positional SBH. Commercially available streptavidin-coated beads can be adhered, 
permanently to plastics like polystyrene, by exposing the plastic first to a brief treatment 
with an organic solvem like triethylamine. The resulting plastic surfaces have 
enormously high biotin binding capacity because of the very high surface area that 
results. This will suffice for radioactively labeled samples. 

For fluorescently labeled samples, the background scattering from such a 
bead-impregnated sample may interfere. In this case, a streptavidin-conjugated glass or 
plastic surface may be utilized (commerciaUy available from Bios Products). Surfaces 
are made using commerciaUy available amine-containing surfaces and using 
commercially available biotin-containing N-hydroxysuccinimide esters to make stable 
peptide conjugates. The resulting surfaces will bind streptavidin, at one biotin binding 
site (or at most two, but not more because the approximate 222 symmetry of the protein 
would preclude this), which would leave other sites available for binding to biotinylated 
oligonucleotides. 

In certain experiments, the need for attaching oUgonucleotides to surfaces 
may be circumvemed altogether, and oUgonucleotides attached to streptavidin-coated 
magnetic microbeads used as ahready done in pUot experiments. The beads can be 
manipulated in microtitre plates. A magnetic separator suitable for such plates can be 
used including the newly available compressed plates. For example, the 18 by 24 weU 
plates (Genetix, Ltd.; USA Scientific Plastics) would aUow containmem of the entire 
array in 3 plates; this formate is weU handled by existing chemical robots. It is 
preferable to use the more compressed 36 by 48 weU formate, so that the entire array 
would fit on a single plate. The advantages of this approach for aU the experiments are 
that any potential complexities from surface effects can be avoided, and already-existing 
liquid handUng, thermal control, and imaging methods can be used for aU the 
experiments. Thus, this allows the characterization of many of the features of positional 
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SBH before having to invest the time and effort in fabricating instruments, tools and 
chips. 

Lastly, a rapid and highly efficient method to print arrays has been 
developed. Master arrays are made which direct the preparation of replicas, or 
appropriate complementary arrays. A master array is made manually (or by a . very 
accurate robot) by sampling a set of custom DNA sequences in the desired pattern and 
then transferring these sequences to the replica. The master array is just a set of all 
1024-^096 compounds. It is printed by multiple headed pipettes and compressed by 
offsetting. A potentiaUy more elegant approach is shown in Figure 14. A master array 
is made and used to transfer components of the repUcas in a sequence-specific way. The 
sequences to be transferred are designed so that they contain the desired 5 or 6 base 
5' variable overhang adjacent to a unique 15 base DNA sequence. 

TTie master array consists of a set of streptavidin bead-impregnated plastic 
coated metal pins, each of which, at its tip. contains immobilized biotinylated DNA 
strands that consist of the variable 5 or 6 base segmem plus the constant 15 base 
segment. Any unoccupied sites on this surface are filled with excess free biotin. To 
produce a replica chip, the master array is incubated with the complement of the 15 
base constant sequence, 5'.labeled with biotin. Next, DNA polymerase is used to 
synthesize the complemem of the 5 or 6 base variable sequence. Then the wet pin array 
is touched to the streptavidin-coated surface of the replica, held at a temperanire above 
the T„ of the complexes on the master array. If there is insufficient Uquid carryover 
from the pin array for effidem sample transfer, the rephca array could first be coated 
with spaced droplets of solvent (either held in concave cavities, or deUvered by a 
multiheaded pipettor). After the transfer, the repUca chip is incubated with the 
complemem of 15 base constant sequence to reform the double-stranded portions of the 
array. The basic advantage of this scheme, if it can be realized, is that the master array 
and transfer compounds are made only once, and then the manufacture of replica arrays 
should be able to proceed ahnost endlessly. 



wo 94/1 1530 



PCT/US93/10616 



-35- 



Example 5 

DNA ligation to oligonu cleotide arrays . Following the schemes shown in 
Figures 3A and 3B, E. coli and T4 DNA Ugases can be used to covalently attach 
hybridized target nucleic add to the correct immobilized oUgonucleotide probe. This 
is a highly accurate and efficient process. Because Ugase absolutely requires a correctly 
base paired 3' terminus, ligase will read only the 3'-terminal sequence of the target 
nucleic acid. After ligation, the resulting duplex will be 23 base pairs long and it will 
be possible to remove unhybridized, unUgated target nucleic acid using fairly stringent 
washing conditions. Appropriately chosen positive and negative controls demonstrate 
the power of this scheme, such as arrays which are lacking a 5'-terminal phosphate 
adjacent to the 3' overhang since these probes wiU not ligate to the target nucleic acid. 

There are a number of advantages to a ligation step. Physical specificity 
is supplanted by enzymatic specificity. Focusing on the 3' end of the target nucleic also 
minimize problems arising from stable secondary strucmres in the target DNA. As 
shown in Figure 3B, ligation can be used to enhance the fidelity of detecting the 5'- 
terminal sequence of a target DNA. 

DNA ligases are also used to covalently attach hybridized target DNA to 
the correct immobilized oligonucleotide probe. Several tests of the feasibility of the 
ligation scheme shown in Figure 3. Biotinylated probes were attached to streptavidin- 
coated magnetic microbeads, and annealed with a shorter, complementary, constant 
sequence to produce duplexes with 5 or 6 base single-stranded overhangs. One set of 
actual sequences used is shown in Example 14. ''P-end labeled targets were aUowed to 
hybridize to the Probes. Free targets were removed by capturing the beads with a 
magnetic separator. DNA ligase was added and ligation was allowed to proceed at 
various salt concentrations. The samples were washed at room temperamre, again 
manipulating the immobilized compounds with a magnetic separator. This should 
remove non-ligated material. Finally, samples were incubated at a temperature above 
the T„ of the duplexes, and eluted single strand was retained after the remainder of the 
samples were removed by magnetic separation. The eluate at this point should consist 
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of the ligated material. The fraction of hgation was estimated as the amount of ^^p 
recovered in the high temperature wash versus the amount recovered in both the high 
and low temperature washes. Results obtained are shown in Figure 13. It is apparent 
that salt conditions can be found where the legation proceeds efficiently with perfectly 
matched 5 or 6 base overhangs, but not with G-T mismatches. 

The results of a more extensive set of similar experiments are shown in 
Tables 2-4. Table 2 looks at the effect of the position of the mismatch and Table 3 
examines the effect of base composition on the relative discrimination of perfect 
matches verses weakly destabilizing mismatches. These data demonstrate that: (1) 
effective discrimination between perfect matches and single mismatches occurs with all 
five base overhangs tested; (2) there is little if any effea of base composition on the 
amount of ligation seen or the effectiveness of match/mismatch discrimination. Thus, 
the serious problems of dealing with base composition effects on stability seen in 
ordinary SBH do not appear to be a problem for positional SBH; and (3) the worst 
mismatch positionis, as expected, the one distal from the phosphodiester bond formed 
in the ligation reaction. However, any mismatches that survive in this position will be 
eliminatd by a polymerase extension reaction, such as as described herein, provided that 
polymerase is used, like sequenase version 2, that has no 3'-endonuclease activity or 
terminal transferase activity; and (4) gel electrophoresis analysis has confirmed that the 
putative ligation products seen in these tests are indeed the actual products synthesized. 

Table 2 

Ligation Efficiency of Matched and Mismatched Duplexes 
in 0.2 M NaCl at 3TC 

(SEQ ID NO 1) 3'-TCG AGA ACC TTG GCT-5' 

Ligation Efficiency 

CTA CTA GGC TGC GTA GTC-5' (SEQ ID NO 2) 

5'-B. GAT GAT CCG ACG CAT CAG AGC TC 0.170 (SEQ ID NO 3) 

5'-B. GAT GAT CCG ACG CAT CAG AGC TT 0.006 SEQ ID NO 4 

5'-B- GAT GAT CCG ACG CAT CAG AGC TA 0.006 (SEQIDN0 5) 

5'-B- GAT GAT CCG ACG CAT CAG AGC CC 0.002 (SEQIDN0 6) 

5-B- GAT GAT CCG ACG CAT CAG AGT TC 0.004 (SEQIDNOT) 

5'-B- GAT GAT CCG ACG CAT CAG AAC TC 0.001 (SEQ ID NO 8) 
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Table 3 

Ligation Efficiency of Matched and Mismatched Duplexes 
in 0.2 M NaQ at 37'C and its Dependance on AT Content of the Overhang 



Overhang Sequences 


AT Content 


Ligation Efficiency 


Match 


GGCCC 


0/5 


030 


Mismatch 


GGCCr 




0.03 


Match 


AGCCC 


1/5 


0.36 


Mismatch 


AGcrc 




0.02 


Match 


AGcrc 


2/5 


0.17 


Mismatch 


AGCTT 




0.01 


Match 


AGATC 


3/5 


0.24 


Mismatch 


AGATT 




0.01 


Match 


ATATC 


4/5 


0.17 


Mismatch 


ATATT 




0.01 


Match 


ATATT 


5/5 


031 


Mismatch 


ATATC 




0.02 



ID NO 2) 
ID NO 3) 
ID NO 4) 
ID NO 5) 
ip NO 6) 
lb NO 7) 
[D NO 8) 
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Table 4 

Increasing Discrinunation by Sequencing Extension at 37-0 

L igation Effirirnn^ Ligation K^.n.;». 

5'-B- GATGATCCGa?gStSgaGAT^''°'^ 
gE0iDNO9) ^^CAGAGATC 024 

GAT GAT CCG ACG CAT CAG Arr -rr 

(SEO ID NO 10) TT aoi ^ 

Discrimination = 

^4 X42 



5'-B- 



250 
xll8 



(SEQ ID lyo 1) 



5'-B- 
5'-B- 



CIA CTA GGciGC oik GTC ^w?/^^ ""^^ "^^ GCT-S" 



(SEQ ID NO 12) 



0.17 


12^0 25,200 


0.01 


240 3^ 


xl7 


x51 x65 


*B' a Biotin 


'•" = ndiotctive UbeJ 



Discrimination = 



internal mismatch (Table 4) a ^- u • discnmmate) as with an 

is bcuer than u.e ^ co JlaltH 1 ^ 

Aliele-spedfic amplification by At ch,in 

May « al, ftoc Natl. Acad. Sci. USA 88:189-93, 1991). 

Example 6 

— rrr rr 
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po,en„al .„f„r„,ado„ f„«y avaUable fr„„ ,he aaay. A variation i„ u,e 
proccdu.es wiU ... a^y „„„ ^ J 

H. e before i„bridi«.o„ ,o ,he p™be a.ay, ,.e 5Mabeled (or u..abe,ed, ' e. 
o^cc acd .s paniaUy de^aded wiu. an e„^e sucb a. exonueleas. m. DLe'ln 
produces a large „uo,ber of ™,ecu,es wiU. a range of cbaiu ieng,hs .ha, s are 

. .ben ^,bnd,zed ,o U,e probe a,., Assunnng «,a, ,be dis«b„«on of a^^nds is 
suffiaen^ broad, .be bybridi^Uon pa.,en. sbouid allow .be sequence of .be Jr^ 
.argc ,0 b read subjea ,o any branch poin. an.bigui.ies. If a single se. of exonuclease 
cond-nons fails .o pro^de a broad enougb,dis.ribu.on, samples could be co^binedtd 
prepared under several different conditions. 

positional SB^r' '"I "''^ ^""'^ 
positional SBH. He easiest, but ultin«tely probably .he leas, satisfacoty is to use 

— ase Ulce exonuclease m, by analogy to nested deletion doningt LI" 
«,uenc.g (S. Henilcoff, Oene .3:351-5,. h, , J^^^^^ 

2' -y not produce an even enough yield o, cotnpounds to fuUy repr^ he 
-nple 0 .teres.. One sees a pattern of regions in the sequence whl the ell 
™oves relatively tapidly. and others where it tnoves relatively slowly. 
oonunerca^y available enzyntes can be exan^ed by loolting a. .he distiibu iol !^ 
fragment lengU. directiy on ordinary p„lya„y,^,. ^NA sequencing gels 

MaxanvGUbert sequenong chenusny. is p<^Me to ligate ,he 5-.phospbo.yla.!I 
^^cnts whtch result .otn these cben^ca, degradations. Indeed titis . .he p^^ 

^frhgation..edia.edgenonticDNAsequencing(aP.Pfiefere.al.,ScL24:^^^^ 
1939 . Asynnnetnc PCR or linear anapUfication can be used to make th 

pre-seiect which base to cleave aftAr . , 

Cleave after, and this provides additional information about the 
DNA sequences one is working with. naoouitne 
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Tie U„rd approach ,o making nested samples is ,o use variants „n 
PWrnmus sequencing. For example, one can make a ve., even DNA se,„en™, 
■adder by .mg Sanger sequencing wid, a dideoxy-pppN .ennina.or. "Hiis does 
produce a gamble end. However i, can be replaced wi,b a Ugautble end. while s.i„ o 
fl.e orrgmal .empiare, by iirs, removing *e ddpppN wiu, ,be 3' editing-exonuc J 

*a .h« accornpasbes rwo tilings for .he price of one. No, oni.v does i, generate 
ladder a iigatable. end. because one can pre-detennine the identity of the L 
remove . ,t provides an addidonal nucleotide of DNA sequence informatiot. One can 
use smgle color detection in four separate, reactions, or ultimately, four color detection 
^ mtxmg t e results of four separate reactions prior to hybridization. If this approa 

hybndtzaton. Note that each of these procedures combines some o, the power of 
ladder sequencing with the parallel processing of SBH. 

samples such"" ^'T"""" ''"™*'™ """"^ °' ~« ^--^ 
samples, such as polymenzation in the absence of limiting amounts of one of the 

™^sn.tebase. such asforDN^oneof the four dN^^StandardSangeror M^ 
G. ber. sequenctng protocols cannot be used togenerate the ladder of DNA fragments 

~ """""^ Of tb! pLer 

. .d^ „g ^ ^^^^ ^^^^^^ ^^^^^ ^ 

target DNA. T '''^ """" °' °' — '^^ 

a^rr T ' ^ ^ - ' 

f Z 1 ^ ' ™' "^""^ ~ '-■^ ■'^ Kl-ow flagmen. 

labeled ddX'"° ' ^ 

labeled ddNTTs one at a time, or a mixture of all four labeled with four different colors 
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simultaneously, the identity of one additional nucleotide of the target nucleic acid can 
be determined as shown in Figure 5. Thus, an array of only 1024 probes would actually 
have the sequencing power of an array of 4096 hexamers, in other words, a 
corresponding four-fold gain for any length used. In addition, polymerases work well 
in solid state sequencing methodologies quite analogous of the type proposed herein. 

Example 7 

Retaining positional information in sequencing by hybridization . Inherent 
in the detection of just the 3'-terminal sequence of the target nucleic add, is the 
possibility of obtaining information about the distance between the sequence hybridized 
and a known reference point. Although that point could be arbitrary, the 5'-end of the 
intact target was used. The desired distance is then just the length of the DNA 
fragment that has hybridized to a particular probe in the array. In principle, there are 
two ways to determine this length. One is to length fractionate (5' labeled) DNA before 
or after the hybridization, ligation, and any DNA polymerase extension. Single DNA 
sequences could be used, but pools of many DNA targets used simultaneously or, 
alternatively, a double-labeled target with one color representing the 5'-end of any 
unique site and the other a random internal label would be more efficient. For 
example, incorporated into the target is a fractional amount, for example, about 1%, of 
biotinylated (or digoxigenin-labeled) pyrimidines, and use this later on for fluorescent 
detection. It has been recently shown that an internal label is effective in high 
sensitivity conventional ladder DNA sequencing. The ratio of the internal label to the 
end label is proportional to target fragment length. For any particular sample the 
relationship is monotonic even though it may be irregular. Thus, correct order is always 
obtained even if distances are occasionally distorted by extreme runs of purines of 
pyrimidines. If necessary, it is also possible to use two quasi-mdependent internal 
labeling schemes. 

The scheme as just outlined, used with polymerase extension, might 
require as many as 6 different colored labels; 2 on the target (5' and internal) and four 
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on ,h. p.ob= extension (four ddNTTs). However the 5' label is unnecessary, since ,he 
3 ex,e„s,on provides the san,e infonnarion (providing ,ha. the DNA poIy„,eras 
reacnon . close .o s,oichi„n,e,ric, Tie ddNTPs can be used one a, a .in,e LZ^ 
Wore, .he scbenae could proceed „i,b as li.Ue as .o color derecio. i, nece^ 
(Figure 7), and .hree colors would cenainly suffice. ^ 
A scheme complimenuor .o to shown in Figure 7 would retain positional 
^onnanon while reading .he S:,.^ „,„ence o, a^-end labeled plus 
lab led .arge. nuCeic acids. Here, as in Figure 3B, probe arrays ™.h /overhangs 'e 
used, however, polymeric extension will not be possible. 



Example 8 



,„H- V P"imnn,hi,,.ili, In current SBH, branch point 

-btpes «.sed by se<,uence recurrences effectively hntit the si.e of the targe. Z 
.0 a few hundred ba.e pairs. Tie positional hformation described in Section 6 

DNA iadd , ,s used as .he santple, two or ntore targets wiU hybridize to the same probe 

and w,U rndtcate the sequence of one base outside the recurrence. TTe easiest 
way 0 portion the n.o recurrent set^ences is ,o ehminate the longer or shorter 
mem^rs of t^e DNA ladder and hybridize remaining species to the probe ar^a^ ^s 

It sZ r H °^ '° ~ ^ way. 

Lht! .r '° ^"^^i" 'adder 

Wthough hts could certainly be done if needed), tastead, one can cut an end-,ab=,ed 

eZes s 'ihT™'™ " " ^"-e ^P-' 

enzymes should be used, smgly or in combination. 
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Addmonal urformadon is available for the recurrence of pencanudeotide 
sequences by .he use of polymerase and single base extension as described in Example 
7. In three cases out of four d,e single additional base will be different for the two 
recurrent sequences. Thm, it will be clear that a recurrence has occurred 

He real power of the positional information comes, not from its 
apphcanon to the recurrem sequences, but to its applications to surrounding unique 
sequences. Tleir order wiU be determined unequivocally, assuming even moderately 
accurate position information, and thus, the effect of the btanch point will be 
etomnated. For example, 10% accuracy in intensity tations for a dual labeled 200 base 
patr ^get will provide a positional accuracy of 20 base pair. This would presumably 
be suffictem to resolve aU but the most extraotdinary recurrences. 

.ff , ,■ ""^ and 

effecfvely hmtt the size of the target nucleic add to a few hundred base pairs 

However, posidonal information derived from Example 7 wih resolve almost all of these 

amb-gmties. If a sequence recuts. more than one tatget fiagment wiH hybridize to or 

o^env^e be detected by subsequem ligation to or extension from a single immobili^d 

slence^' ^ ^^^^ ~' 

sequence. For a sequence which occurs just twice, the true locadon is symmetric around 

.he apparem one. For example, the apparent position of a recurrent sequence occuning 

m postttons 50 and 100 bases from the J'-end of the target will be 75 bases from the 

end. However, when the pattern of positional sequendng by hybridizadon is examined 

neirr rf::" " "^^^^ --^ — - 

netghborhood of 50 bases and 100 bases from the 5'.nd. This wiU indicate that a 
repeat has occurred. 

Example 9 

Pi^,, , . "M". V.„ of„. Using the scheme shown in 

Ftgure 8. ,t « poss.ble to lean, the identity of the base 3' to the known sequence of the 
target, as revealed by its hybridization position on an oUgonudeotide atray For 
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example, an array of 4- siagle-stranded overhangs of .he ^ NAGCTA 3', as shown i„ 
.he Fignrc, are created wherein n is .he number of known bases in an overhang oi 
ength n. 1. -ne ,arge, is prepared by using a 5' label in ,he manner shown in File 
3. T^e Klenow fagmem of DNA polymerase would .hen be used .o add a single 

dpppNpasapolymeriradon Chain .ermina.or(oral.erna.ively,ddpppN,ennina.orspL 
hga.able ends). Before hybridization .he resulting S'-enninal phospha.e wo J,, 

removed by alkaline phospha.ase. nus would allow subsequen. ligation of .he mrge, 

«> tte probe array. Eiti,er by four successive single color 5' labels, or a mixture of four 

d.fferen. colored chains, each color co^esponding .o a particular chain .ermina,„r one 

would be able .o infer .he idend.y of ti,e base .ha. had paired wi.h .he N nex, .o .he 

sequence AGCT^ Ubeiing of .he 5' end minimizes in.e,ference of fluorescen. base 

den,a.,veson.heliga.ions.ep.Presumably.pr„videdwi.hasupp,yofdpppNp,orri^ 
PPPNP wh.ch can be easUy prepared, .he sequenase ve.ion 2 or ano.her known 
polymerase w.li use .hese as a subs,ra.e. Tie key s.ep in .his scheme is .o add a single 
dpppNp as a polymerization chain .erminaror. Before hybddization. ti,e resulting 3- 
.enmna, phospha.e is removed by aUcahne phospharase. "nus allows for .he subsequen, 
hgation of . e targe, to the probe array. Alten^tively, ddpppNp ,ennina.o„ replaced j 
v..h hgatable ends may also be used. Eititer by four successive single color 5' labels, i 
or a mu«ure o, four different colored chains, each color representing a specific chain ! 
ermmator, one is able ,o infer .he identi.y of the base ti«, had paired wi.h .he N nex. 
0 ti,e sequence AGCTA Tie 5' end is labeled to minimize interference of fluorescen- 
based denvanves wi.h .he Uga.ion s.ep. i 

Assuming ti,a. there are sufficiem colors in a polychromatic detection ^ 
scheme, .h,s 3 target extension can be combined with the 3' probe extension to read i 
n.2 bases in an array of complexity 4-. TOs is po.entially quite a subsmn.ial 
■mprovemen.. ,. decreases the size of the a.ay needed by a factor of 16 withou, anv 
loss m sequencing power. However, .he number of colors required begins .o become i 
somewha, daunting. ,„ principle one would want at least mne, four for each 3' 
extension and one general internal label for targe, length. However, with resonance I 
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ionization spectroscopy (RIS) deteaion, eight colors are available with just a single type 
of metal atom, and many more could be had with just two metals. 

Example 10 

Extending the 5' seque nce of the tarpet. In example 5, it was illustrated 
that by polymerase extension of the 3'-end of the probe, a single additional nucleotide 
on the target could be determined after ligation. That procedure used only chain 
terminators. Florescent labeled d^aPs that serve as substrates for DNA polymerase 
and other enzymes of DNA metabolism can also be made. The probe-target complex 
of each ligation reaction with, for example, three labeled dNTPs and a fourth unlabeled 
chain terminator could be extended using fluorescent labeled dNTPs. This could be 
repeated, successively, with each possible chain terminator. If the ratio of the intensities 
of the different labels can be measured fairly accurately, a considerable amount of 
additional sequence information will be obtained. If the absolute intensities could be 
measured, the power of the method appears to be very substantial since one is in 
essence doing a bit of four color DNA sequencing at each site on the oUgonucleotide 
array. For example, as shown in Figure 9, for the sequence (Pu)4T, such an approach 
would unambiguously reveal 12 out of the 16 possible sequences and the remainder 
would be divided into two ambiguous pairs each. Alternatively, once the probe array 
has captured target DNAs, fiiU plus-minus DNA sequencing reactions could be carried 
out on aU targets. Single nucleotide DNA addition methods have been described that 
would also be suitable for such a highly parallelized implementation. 

Example 11 

Sample pooling in positi onal sequencing bv hybridization A typical 200 
base pair target will detect only 196 probes on a five base 1024 probe array. This is not 
far from the ideal case in single, monochromatic sampling where one might like to 
detect half the probes each time. However, as the procedure is not restricted to single 
colors, the array is not necessarily this small. With an octanucleotide array, in 
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conventional positional sequencing by hybridization or one of its herein described 
enhancements, the target detects only 1/32 of the immobilized probes. To increase 
efficiency a mixture of 16 targets can be used with two enhancements. First, intelligenUy 
constructed orthogonal pools of probes can be used for mapping by hybridizatioa 
Hybridization sequencing with these pools would be straightforward. Pools of targets, 
pools of probes, or pools of both can be used. 

Second, in the analysis by conventional sequencing by hybridization of an 
array of 2 x 10* probes, divided into as few as 24 pools containing 8 x 10^ probes each, 
there is a great deal of redundancy. Excluding branch points, 24 hybridizations could 
determine. aU the nucleic acid sequences of aU the targets. However, using RIS 
detection there are much more than 24 colors. TTierefore, all the hybridizations plus 
appropriate controls could be done simultaneously, provided that the density of the 
nucleic acid sample were high enough to keep target concentration far in excess of all 
the probes. A single hybridization experiment could produce 4 x W base pairs of 
sequence information. An efficient laboratoiy could perform 25 such hybridizations in 
a day, resulting in a throughput of 10« base-pairs of sequence per day. TOs is 
comparable to the speed of polymerization by £. coli DNA polymerase. 

Example 12 

Oligonucleotide ligation nfr^r t.rp .t KyK„->^.%^tinn Stacking hybridization 
without Hgation has been demonstrated in a simple format Eight-mer oUgonucleotides 
were amiealed to a target and then amiealed to an adjacent 5-mer to extend the 
readable sequence from 8 to 13 bases. Hiis is done with small pools of 5-mers 
specificaUy chosen to resolve ambiguities in sequence data that has already been 
determined by ordinary sequencing by hybridization using 8-mers alone. TTie method 
appears to work quite well, but it is cumbersome because a custom pool of 5-mers must 
be created to deal with each panicular situation. In contrast, the approach taken herein 
(Figure 9), after ligation of the target to the probe, is to ligate a mixtures of 5-mers 
arranged in polychromatically labeled orthogonal pools. For example, using 5-mers of 
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the form pATGCAp or pATGCddA, only a single ligation event will occur with each 
probe-target complex. These would be 3' labeled to avoid interference with the ligase. 
Only ten pools are required for a binary sieve analysis of 5-mers. In reality it would 
make sense to use many more, say 16, to introduce redundancy. If only four colors are 
available, those would require four successive hybridizations. For example, sixteen 
colors would allow a single hybridization. But the result of this scheme is that one reads 
ten bases per site in the array, equivalent to the use of 4"> probes, but one only has to 
make 2x4^ probes. The gain in efficiency in this scheme is a factor of 500 over 
conventional sequencing by hybridization. 



Example 13 

Synthesis of custom arr avs of probes . Custom arrays of probe would be 
useful to detect a change in nucleic acid sequence, such as any single base change in a 
pre-selected large population of sequences. This is important for detecting mutations, 
for comparative sequencing, and for finding new, potentially rare polymorphisms. One 
set of target sequences can be customized to an initial general array of nucleic acid 
probes to turn the probe into a specific detector for any alterations of a particular 
sequence or series of sequences. The initial experiment is the same as outlined above 
in Example 4, except that the 3'-blocked 5-mers are unlabeled. After the ligation, the 
initial nucleic acid target strand along with its attached 18 nucleotide stalk is removed, 
and a new unligated 18 nucleotide stalk annealed to each element of the immobilized 
array (Figure 11). The difference is that because of its history, many (ideaUy 50% or 
more), of the elements of that array now have 10 base 3' extensions instead of 5 base 
extensions. These do not represent aU 4'° possible lO^mers, but instead represent just 
those lO-mers which were present in the original sample. A comparison sample can 
now be hybridized to the, new array under conditions that detect single mismatches in 
a decanucleotide duplex. Any samples which fail to hybridize are suspects for altered 
bases. 
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caused by oncogenesis or environmenul insul, migh. be easily de.ec«ble 



Example 14 



probes wi.h /^^■7''"''^'-"""f''Vhv briHinii,„, Hybridi«.ion was performed usmg 
P^bes «„h five and s„ base pair overhangs, indudmg a five base pair malch a fiv 
b^c pa. m.sma.ch, a six base pair ma.ch, and a six base pair La.cb ^ 
sequences are depicted in Table 5. 



■Table 5 

Test Seq iienrpc- 

5 bp overlap, perfect match: 

3'-CTA CTA GGC TGC GTA GtJ'"^^^ ""^^ ""^ (SEQ ID NO 1) 

5 -biotin-GAT GAT CCG ACG CAT CAG AGC TC-3' ^^^Q ^ NO 2) 



1 

(SEQ ID NO 3) 3 

5 bp overlap, mismatch at 3' end: | 



3'-CrA CTA GGC TGC GTA GTc"^''^ '''''' ^'''^ ""^ ^^''^^ (SEQ ID NO 1) 

5 -biotm-GAT GAT CCG ACG CAT CAG AGC Tr-3' ^^^0 NO 2) 



(SEQ ID NO 4) 



6 bp overlap, perferct match: 

3'-CrA CTA GGC TGC GTA GTC^'^^^ '^^'^ """^^ '^^ (SEQ ID NO 1) 

5 -biotm-GAT GAT CCG ACG CAT CAG AGC TCT-3' ^^^^ ^O 2) 



(SEQ ID NO 13) 



6 bp overlap, mismatch four tfeses f^om 3' end: 

3--CTA CTA GGC TGC GTA GTc'^"^"^ ^^"^ ™ (SEQ ID NO 1) 

5 -biotm-GAT GAT CCG ACG CAT CAG ACT TCT-3' ^^^^ ^ ^° 



(SEQ ID NO 14) 
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I «'^'«' ""^otic beads (DvmJ) co««. °' ™'""'«P«™. Polystyrene- 

i "Muto. After ligadon, tte samples were „,h- . '™P"an.re for 30 

»«P=n>a«„, ,0 U,e total amount of ' ^ ™' i" "he ho. 

^ "~ons,nusn,atel,ed.argetse,uencesw?r'rer ^' '''' 

i" the cold washes. Under ,k. .._ ■ ' . ""'"-'^""'"""ledorwere 

removed 

-.aled and hgated to the ^TITZT' '^""^ 

probe oligooudeodde. Tlis oUgonudeotid, ""-"'"'i'ytaed 

"ad been ligated ,o the probe ~' '"'""^ "» "-ge. 

Example 15 

suggested im^Ii^^^^^^^^^^^f^f^^^^^^^ A major problem in all 

composilo,. and. at least in some c^e onT" '^'^ °' °" 

Uke '^"^ t^^^ «1"ence. He nse of m>„sual salts 

tettamed,,! ammonium halides or be.ainesOV.A.R„. , , „ 

.SPajoffetsoneapproachtominimizingthes^!^,^?: ^'"^'^'^ ^^•'37^. 
2.6.diamino purine and 5-br„mo V ^'"^'=ly. base analogs li*e 

increase the stability of A-T base paris J T °^ ^' '«P«°i«ly >o 

--ease .be stability of G-C 1~ '"^'^^ ^ ^ •» 

indicate dtat the use of enzymes will „• *<»™ it" Table 2 

sequences, "nus gives thHToa^ °' "> ^ 

mediods which require afferent a,lt '"I -on-enzymadc 

toGCcontent. 'o'"*-"-^" and are highly ZThed 
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Another method to compensate for differences in stability is to van. . 
base next to the stacking <:itP p^, • ^^^Duiiy is to vary the 

ligation discrimination. Base analogs such a. dl r 

t.ct ^ 'inaiogs such as dU (deoxyundine) and 7-deazaG ^r. oi 

tested as components of the target DNA to see if th.c 

structure. Single-stranded h' H "^^'^''^^"PP'-«^-ff«<^tsofsecondan 
Single stranded bmdmg proteins may also be helpful in this regard. 



Example 16 



.Dal^ measiir.-inent. nrn,-...^„. ■ 

%bHdi..on a.. „<,„.ed J J ~dl 1 ^ """""" 

have been „sed in prior SBH effom CCD " * °' '^'^ 

P.a.e analyzes J ,ad Je.ld 2^ Te n. T""' "'"^ ""^^^ 

limited .0 on,, n.0 color analys. of DKa!1T f 7"" " 

-i«.e„..,in.a.ed..o„^^ ..irrr:, ;:^! - 
developed, the detection nf . '^^^^ ^^ss ^e" 

sources ar^d ^1^ ;:: I? '"'"^ ' ^ 

^spr^eLr i:r rrr r~ "~ 

red dyes are used Ho "^^^ ^^^^^^^We if infra- 

-e ,0 work Tie 0^ !^ " ^^^^ - 

in^oduced in.o .ar,e.t ITJ; " "'^ ^ ^= — ^-e' ^' 
PCR primers for 5 lab^ ^ ^ 5' 

labeled branll fl 7 "T' ^''^ '^'"'^ "^"'--"-^ " 

DN.endra:::"'"'^""^'"-'''^^ 

CD camera can deal w„t „e san,e TTFF 8 bi, da,a formare. T,^. software 
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developed for either instrument can be used to h.nHi w 

instruments, ^s wil, save a great deal of °" 

-are. Sequence interpretir^^^^^^ 

chip data and assembling it into conti. ^ ^"''"^"^S 

a. ^.o.e .... ^:z:r^^zzrz:T' ^ 

available i„ U,e interested user co_ 1 ll I , ^^'^ " ^'""^^ 
samples that may ultimatelv h« „c.h u ■ ^ onhogonal pools of 

a^o„ee.e../e.::::rj!:;r:^^^^^^^^ 

Example 17 

S-reptavidi. were washed twice «i,hr, J^^ ^""^ 
concentration of beads was^blTs 2 ^ "—oncf 5,ng/n,). Ffaal 

a .Ota, volume of 80. l^^ 'oT"'"^ ^""^ ""^^^^ 

o.Sowereadded.dtben^es^en.Xf„:;:::r:a:t— 

Table 6 

MPROBEC 12W iTI^™ 2l)<.linlml 

MPROBEG 94l8 i,?^™ 16».linlml 

MPROBET ,47:1 ;^„°^™' JMinin. 

464,000pmol 5^1 in 185,,^ 
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Tubes were placed in the Dynal MFC apparatus and the supernatant 
removed. Unbound streptavidin sites were sealed with 5^*1 of 200^M free biotin in 
water. Wash the beads several times with 80^.1 TE. Hiese beads can store in this state 
at 4°C for several weeks. 

250nM of 5'-biotinylated 18 base nucleic acid (the complement of the 
constant region) served as primer for enzymatic extension of the probe region. ITie tube 
was heated to 68-C and aUowed to cool to room temperature. Beads were kept in 
suspension by tipping gently. Supernatant was removed and washed with 40^1 TE 
several times. The tube was removed from the magnet and the beads resuspended in 
40^1 of TE to remove excess complement. TTie bead suspension was equally divided 
among 4 tubes and the stock tube washed with the wash divided among the tubes as 
well. Supernatant was removed and washed with water. Each mbe contained about 2-5 
pmol of DNA (28-72ng; see Table 6). 

Polymerase I extension was performed on each tube of DNA in a total of 
13 ^1 as foUows (see Table 7): NEB buffer concentration was lOmM Tris-HCl. pH 7 5 
5mM Mga, 7.5mM DTT; 33^M d(N-N^TP mix; 2,M ^ dN JP complimentary to 
one of the N, bases; and polymerase I large fragment (klenow). In the first weU was 
added dTTP. dCIP and dGTP, to a concentration of 33;.M. '^P-dATP was added to a 
concentration of S/xM. dNTP stock solutions of 200mM were pooled to lack the labelled 
nucleotide (i.e. Tube A contains QG and T) adding 63^1 dNTP. 5^1 lOO^M dNTP. and 
43^1 water. Radioactively labeled ('dNTP) stock solutions were 20^M prepared from 
2^1 [a"P] dNTP, 5^1 200mM dNTP, and 43^1 water. 



S93/10616 



supernatant 
free biotin in 
e in this state 



sment of the 
on. The tube 
were kept in 
vith 40fil TE 
suspended in 

•a 

ually divided 
the tubes as 
led about 2-5 

V in a total of 

«C1, pH 7.5, 

olimentaiy to 

5rst well was 

^ added to a 
i 

^ the labelled i 



dNTP.and I 

^^pared from ? 
-I 



r3 
m 

4 



wo 94/1 1530 



PCr/US93/10616 



-53- 




The tubes were incubated at 25 »r fn^ • 
of enzymatic extension, hi.h ^5 C for 15 nunutes. To optimize the yields 

conceLtion ^ ^Z ZZ^T ""'^ " ^^^^ ^ 

^i/tm. supernatant was removed flnHth-K j • 

of -re buffe. several .ine. and .e..pe„ded in SsZ 4 1 ' ""^ ""^ 

and i. was expected d,a. .here would be abl 8% 

Asa,e«„f,h . . """''^"""""Poraoon of the label added. 

^ Of o.:r;r ::rr r"''^'"-'''----- 

s~ .on. each .be T T 

inc„ba,edaseco„dUme„i.h50„„f0,MNaOHT 

•he firs, se. Of beads were heated .o es^^nZt^cT'l'"^" 

-nt. Hach base was ~eu.rah.ed Jh m HO ^^TJ^^^^ 

--wereadded.o.he.e,.eds^d:d:rd:::i:i:-— 
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with gentle shaking. Supematants were removed and saved for counting. The beads 
were washed several times with TE. Results are shown in Table 8. 

Table 8 

Incorooratinn nf loh ^j (MPRORFr S'-CATan^) 

A 28,711 / 779,480 

C 35,193 / 574,760 

G 15,335 / 754,400 

T 43,048 / 799,440 



10 A ^^i^ ^^f^ Unbelted Effidencv 

^ iii58 3^;gS X 1^-^^^ 



2,330 10,419 
'^•"^ 5^2 S 



„ K . /""''"'I 'o .ynttesked strand capmred „„ fresh beads. 

Unbo^d refers ,o U,e synthesized strand that was not capmred by the bead and 
uomehed refers to counts remaining on the odginal beads. As can be observed 
hetween about 43% and 58% of the newly synthesized strands «.re successfuli 
transferred indicating that an array of such strands could be successfully replicated. 

Example 18 

20 ^ 

A P-V^lnre for m,Mn, complex m.y. by fn> . A slightly complex, but ' 
cot^tderably improved scheme ,o test the generality of the new approach to SBH, 
™hout the need to synthesize, seprately, aU 1024 five-mer probes has been developed. - 
Tim procedure aUows one to generate arrays witi> 5>- and/or 3--overhangs and uses PCR I 
.0 prepare the final probes used for hybridization which may easily be labeled w,-.h 
b.oun. 1. also builds in a way of leanung par, or even all of the identity of each probe 
sequence, ^ - t 

': s 



I 
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complementary duplexes. In the above sequences N reore^t 

of aU 4 bases- R «n • . ^presents an equimolar mixture 

ui au 4 oases, R is an equmiolar mixture of A and n- anH v 

of T and C TTie underlin.H ^ and G, and Y is an equimolar mixture 

«iu ine underhned sequences are BstXI anH w»« / 
fa) vrTrr^r^Ar^^^ ^"^^ ^ -"^^ recogmtion sites. 

3 -AGATCrCGATCG-5' (SEQ ID NO 17) 



(a) 



(b) 



(b) 



primer 

primer 




-3' (SEQ ID NO 16) 
" (SEQ ID NO 20) 



can be covened to 5 h... r u -overhangmg smgle-strands which 

shown in Fimirp 

- er angs (see below) used for the type of positional SBH 

3.GGTN NNNNNACC-5' 

The ^^fl /-cutting site overlaos with thp v/ 
.i>e gencra.cn of 5 base S'-overtanging sin^^ ^'""^ 
.he we of posdcnal SBH sho»n I FiJ 2^^' 

i>.iNiNiNiN.5 3 -CTGCGNNNNNNNNNN-5' 
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The 5'- and 3'-tenninal sequences of strand (a) are also recognition site 
for sai l and I, respectively; the corresponding sequence in strl , ^ 
recognition sites for Xho I and Xma I, respectively: 

5'-GTCGAC-3' Sail 5'-G TCGAC-3' 
3'-CAGCrG-5' - 3'-CAgS G-5. 

3-CGATCG-5' - 3'-CGATC G-5' 

5'-CTCGAG-3' Xho I 5'-C TCGAG-3' 
3'-GAGCrC-5' - 3^GAGCT C-5' 

5'-CCCGGG-3' Xma I 5'-C CCGGP r 
3'-GGGCCC-5' - 3'-GGGCC S^'^''"' 

Those cloning sites are chosen such that, even with the degeneracy allowed 

by the sequences 5'- YNNNNR-3' and 5'-RNNNNY 3' the.. •„ 

^ . . " ^^^^^i^ ^-3 , these enzymes will not cleave 

.he probe reg,o„s. For cloning, duplexes (a) were cleaved «i.h both 5^ / and me / 

'T" 0» wiU, A7„ / and , The resulting digestion 

p^^«.„ered.rectioual,clonedin.oanapprop.a.evec.or(e,.^ 
surrable ceUs were ,ran.,or„ed wi,h ,he vector, and colonies plated IndiVdu^ Cone 

:ri '^'^ — - 

ns fron, .dMdual clones were amplified by PCR wid. „ne biotinyU«d primer 
™d,ng ,0 .be 5-.bases of d.e bouon, srrand. In a separate PCR, ,he Janons 

1 Bs T 71° ~ ^ ~ - 

w.-. Bs XI and .he biotin-labeled products capntred on streptavindin beads or surfaces. 

Note .bat by using PCR an,plificatio„ instead of DNA puHfication. the need «, 
separately purify and biotinyla.e each clone is also elintina.ed 

P^^"'^'^' PCR produce were Cleaved by /fea/ which generate. 
5 -overhangs consisting of randomized sequences, lie idemity of each clone can the.. 
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be detenmned by separate primer extensions of each of the two DNA pieces resulting 

from Hga I cleavage. For each pair of sequences, which derive from the same clone 

the overhangs must be complementary. TT^erefore. sequencing just three bases on each 

fragment strand will given the entire strucmre of two probes. Tins plus/minus 

sequencmg can be done in microtire plates and is easily automated. It will fail only in 

the few cases were 5'-RNNNNY-3' in strand (b) contains 5'-GACGC-3', which is the 

recognition site for Hga I. Tl.e number of prier extension reactions required can be 

reduced by synthesis of more restricted pools of sequences. For example, using 4 pools 

where the base in one particular postion is known in advance, such as 5'-YNNANR-3' 

To make the probes needed for positional SBH (as sown in Figure 2A) 

the duplex PGR products are first attached to a solid support through streptavidin Tl^e^ 

are then cleaved with Bst XI to generate the following pairs of products: 

5'-B-GTCGACAGTrGACGCTACCAYNNNN-3' 
3- CAGCTGTCAACTGCGATGGTR-5' 

5'-B-GCTAGCTCTAGACCAYNNNN-3' 
3'- CGATCGAGATCTGGTR-5' 



(SEQ ID NO 25) 
(SEQ ID NO 26) 



v°'ST?S^S;^^'^^^^°CTACCARNNNN-3' 
3- GAGCTCTCAACTGCGATGGTY-5' 

5'-B-CCCGGGTCTAGACCARNNNN-3' 
3'- GGGCCCAGATCTGGTY.5' 



(SEQ ID NO 27) 
(SEQ ID NO 28) 

(SEQ ID NO 29) 
(SEQ ID NO 30) 

(SEQ ID NO 31) 
(SEQ ID NO 32) 
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The 5 base 3' overhangs needed for positional SBH are made bv replacin 
22"^''^'^''''''' (non-biorinylated) strands with constant strands which are one ^2 

5'-B-GTCGACAGTrGACGCTACCAYNNNN-3 
3'- CAGCTGTCAACTGCGATGGT-5' 

5'-B-GCTAGCTCTAGACCAYNNNN-3' 
3- CGATCGAGATCTGGT-S' 

3- GAGCTCTCAACTGCGATGGT-5' 

5'-B-CCCGGGTCTAGACCARNNNN-3' 
3- GGGCCCAGATCTGGT-S' 



(SEO ID NO 25) 
(SEQ ID NO 33) 

(SEO ID NO 27) 
(SEQ ID NO 34) 

^, 

3'- GAGCTCTCAACrrGCGATGGT-^'^'^'''''' ^^^^ ^o 29) 

(SEQ ID NO 35) 



(SEQ ID NO 31) 
(SEQ ID NO 36) 



™.h Se. *™ ' 3-.verba.gH,g a^ays amenable ,o ex,ens.o„ 

rl ' T ^-'P - and B 

T ' T ^''^ °"''^'> - '0 ensure ,ha. ^, „, ,he 

sequences ( > 99%) a.e p««„, bu. ,bis array is n,ucb larger *an op.in^. praeJ 
a l-bra. ^ „ee<. o.y pro.de appro,i..,e,y of .be se,ue Js and, . nece^ 
can be supplemented ,o fiU i„ d,e u,issing variable clones by direct syntbesis 

Skilled in ,he rr °' ^ parent ,„ those 

d.d„ ed bereta I, . intended tba. tbe specification and examples be considered 
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>y replacing 
re one base 



Q ID NO 25) 
0 ID NO 33) 

T 

Q ID NO 27) 
Q ID NO 34) 

Q ID NO 29) 
0 ID NO 35) 

0 ID NO 31) 
0 ID NO 36) 



) extension 
2A and B. 
t all of the 
In practice, 

|necessaiy, 
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SEQUENCE USTING 



(1) GENERAL INFORMATION: 

(i) APPUCANT: CANTOR, Charles 

PRZETAKIEWICZ, Marek 



(ii) TITLE OF 
HYBRIDIZATION 



INVENTION: POSITIONAL SEQUENCING BY 



(iii) NUMBER OF SEQUENCES: 36 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: BAKER & BOTTS, LLP 

(B) STREET: 555 13th Street, N.W., Suite 500 East 

(C) CITY: Washington 

(D) STATE: D.C 

(E) COUNTRY: U.S.A. 

(F) ZIP: 20004-1109 

(v) COMPUTER READABLE FORM- 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 

(vi) CURRENT APPUCATION DATA: 

(A) APPUCATION NUMBER: US 08/110691 

(B) FILING DATE: 23-AUG-1993 

(C) CLASSmCATION: 

(vii) PRIOR APPUCATION DATA: 

(A) APPUCATION NUMBER: US 07/972 012 

(B) FlUNG DATE: 06-NOV-1992 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Remenick, James 

(B) REGISTRATION NUMBER: 36 902 

(C) REFERENCE/DOCKET NUMBER: 16865-0124 

(ix) TELECOMMUNICATION INFORMATION- 
(A) TELEPHONE: (202) 639-7721 
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(B) TELEFAX: (202) 639-7832 

(2) INFORMATION FOR SEQ ID NO.l: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 15 base oairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
TCGGTTCCAA GAGCT 
(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
CTGATGCGTC GGATCATC 
(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic add 

(C) STTIANDEDNESS: single 

(D) TOPOLOGY: linear 



15 



18 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIFnON: SEQ ID N0:3: 
GATGATCCGA CGCATCAGAG CTC 
(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GATGATCCGA CGCATCAGAG CTT 
(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 
GATGATCCGA CGCATCAGAG CTA 
(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENCra: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIFnON: SEQ ID N0:6: 
GATGATCCGA CGCATCAGAG CCC 
(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPnON: SEQ ID N0:7: 
GATGATCCGA CGCATCAGAG TTC 
(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIFHON: SEQ ID N0:8: 



23 
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GATGATCCGA CGCATCAGAA CTC 
(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
GATGATCCGA CGCATCAGAG ATC 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GATGATCCGA CGCATCAGAG CTT 
(2) INFORMATION FOR SEQ ID N0:11: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: Hnear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
GATGATCCGA CGCATCAGAT ATC 
(2) INFORMATION FOR SEQ ID N0:12: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 
GATGATCCGA CGCATCAGAT ATT 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACIERIST[CS- 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
GATGATCCGA CGCATCAGAG CTCT 
(2) INFORMATION FOR SEQ ID N0:14: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
GATGATCCGA CGCATCAGAG TTCT 
(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 41 base nairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRffTION: SEQ ID N0:15: 
GTCGACAGTT GACGCFACCA YNNNNRTGGT CTAGAGCTAG 
(2) INFORMATION FOR SEQ ID N0:16: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 41 base oairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPHON: SEQ ID NO:16: 
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CTCGAGAGTT GACGCTACCA RNNNNYTGGT CTAGACCCGG G 



(2) INFORMATION FOR SEQ ID NO:17: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCTAGCTCTA GA 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
GCTAGCTCTA GACCAYNNNN RTGGTAGCGT CAACTGTCGA C 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: 
CCCGGGTCTA GA 12 
(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

CCCGGGTCTA GACCARNNNN YTGGTAGCGT CAACTCTCGA G 
41 

(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
CCANNNNNNT GG 

(2) INFORMATION FOR SEQ ID NO:22: 



12 
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(i) SEQUENCE CHARACTERISTirs- 

(A) LENGTH: 12 bas™'''''- 

(B) TYPE: nucleic acid 

(C) SIl^ANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA(genonuc) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CCANNNNNNT GG 

(2) INFORMATION FOR SEQ ID NO:23: 
(i) SEQUENCE CHARACTERISTrrq- 

)°) T^E: nucleic acid 

(C) STIUNDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPnON: SEQ ID NO:23: 
GACGCNNNNN NNNNN 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACrERISTrr<;. 

(B) TYPE: nucleic acid 

(C) STJANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



12 
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NNNNNNNNNN GCGTC 

(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
GTCGACAGTT GACGCTACCA YNNNN 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
RTGGTAGCGT CAACTGTCGA C 
(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27; 
GCTAGCTCTA GACCAYNNNN 
(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
RTGGTCTAGA GCTAGC 
(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
CTCGAGAGTT GACGCTACCA RNNNN 
(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic add 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
YTGGTAGCGT CAACTCTCGA G 
(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(u) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CCCGGGTCTA GACCARNNNN 
(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
YTGGTCTAGA CCCGGG 
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(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
TGGTAGCGTC AACTGTCGAC 20 
(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
TGGTCTAGAG CTAGC 
(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
TGGTAGCGTC AACTCTCGAG 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
TGGTCTAGAC CCGGG 
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We Claim : 

1. A method for creating an array of probes comprising the steps of: 

a) synthesizing a first set of nucleic adds each comprising a constant 
sequence of length C at a 3'-terminus and a random sequence of length 
R at a 5'-terminus; 

b) synthesizing a second set of nucleic acids each comprising a sequence 
complimentary to the constant sequence of each of the first nucleic acid; 
and 

c) hybridizing the first set with the second set to create the array. 

2. The method of claim 1 wherein the nucleic acids of the first set are each between 
about 15-30 nucleotides in length and the nucleic acids of the second set are each 
between about 10-25 nucleotides in length. 

3. The method of claim 1 wherein C is between about 7-20 nucleotides and R is 
between about 3-5 nucleotides. 

4. The method of claim 1 wherein the array comprises about 4" different probes. 

5. The method of claim 1 wherein the array is fixed to a solid support and the solid 
support is seleaed from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and chips. 

6. An array of probes created by the method of claim 1. 

7. A method for creating an array of probes fixed to a solid support comprising the 
steps of: 

a) synthesizing a first set of nucleic acids each comprising a constant 
sequence of length C at a 3'-terminus and a random sequence of length 
R at a 5'-terminus; 

b) fixing the first set to the solid support; 

c) synthesizing a second set of nucleic acids each comprising a sequence 
complimentary to the constant region of the first set; and 

d) hybridizing the nucleic acids of the first set with the second set to create 
the array. 
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8. A method for creating an array of probes comprising the steps of: 

a) synthesizing an array of single-stranded nucleic acids each containing a 
constant sequence at the 3'-tenninus, another constant sequence at the 5'- 
terminus, and a random internal sequence of length R flanked by the 
cleavage sites of a restriction enzyme; 

b) synthesizing an array of primers each compliementary to a ponion of the 
constant sequence of the 3'-terminus, hybridizing the two arrays together 
to form hybrids; 

c) extending the sequence of each primer by polymerization using a 
sequence of the nucleic acid as a template; and 

d) cleaving the extended hybrids with the restriction enzyme to form an array 
of probes with a double-stranded portion at one terminus, a single- 
stranded portion containing the random sequence at the opposite 
terminus. 

The method of claim 8 wherein the nucleic acids are each between about 10-50 
nucleotides in length. 

10. The method of claim 8 wherein R is between about 3-5 nucleotides in length. 

1 1. The method of claim 8 wherein the restriction enzyme is selected from the group 
consisting of restriction enzymes which produce 5'-overhangs and restriction enzymes 
which produce 3'-overhangs. 

12. The method of claim 8 wherein the array of probes is fixed to a solid support and 
the solid support which is selected from the group consisting of plastics, ceramics, 
metals, resins, gels, membranes and chips. 

13. An array of probes created by the method of claim 8. 

14. A method for creating an array of probes comprising the steps of: 

a) synthesizing an array of single-stranded nucleic acids each containing a 
constant sequence at the 3'-terminus, another constant sequence at the 5'- 
terminus, and a random internal sequence of length R flanked by the 
cleavage sites of a restriction enzyme; 
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b) synthesizing an array of primers with a sequence complimentaiy to the 
constant sequence at the 3'-terminus; 

c) hybridizing the two arrays together to form hybrids; 

d) enzymaticaUy extending the primers using the nucleic acids as templates 
to form full-length hybrids; 

e) cloning the full-length hybrids into vectors; 

f) amplifying the cloned sequences by multiple polymerase chain reactions; 
and 

g) cleaving the amplified sequences with the restriction enzyme to form the 
array of probes with a double-stranded portion at one terminus and a 
single-stranded portion containing the random sequence at the opposite 
terminus. 

15. The method of claim 14 wherein the array of probes have 5'- or 3'-overhangs. 

16. The method of claim 14 wherein the array of probes is fixed to a solid suppon 
and the solid support is selected fi-om the group consisting of plastics, ceramics, metals, 
resins, polymers, fihns, gels, membranes and chips. 

17. An array of probes created by the method of claim 14. 

18. A method for detecting a nucleic acid in a biological sample comprising the steps 
of: 

a) creating an array of probes fixed to a solid support according to the 
method of claim 7; 

b) labeling the nucleic acid of the biological sample with a detectable label; 

c) hybridizing the labeled nucleic acid to the array; and 

d) detecting the sequence of the nucleic acid fi-om a binding pattern of the 
label on the array. 

19. A method for identifying a target nucleic acid in a biological sample comprising 
the steps of: 

a) creating an array of probes fixed to a solid suppon according to the 
method of claim 7; 
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b) labeling the target of the biological sample with a detectable label; 

c) hybridizing the labeled target to the array; and 

d) identifying the target from a binding pattern of the label on the array. 

20. The method of claim 19 wherein the detectable label is selected from the group 
consisting of radioisotopes, stable isotopes, enzymes, fluorescent and limiinescent 
chemicals, chromatic chemicals, metals, electric charges, and spatial chemicals. 

21. The method of claim 19 wherein the nucleic acid identified is selected from the 
group consisting of nucleic acids derived fi'om viruses, bacteria, parasites, fungi and 
yeast. 

22. The method of claim 19 wherein t^ie binding pattern is a nucleic acid fingerprint. 

23. A diagnostic aid for detecting a target nucleic acid in a biological sample 
comprising the array of claim 19, a solid support on which the array is fixed, a 
detectable label, and the biological sample. 

24. The method of claim 19 wherein the biological sample is selected from the group 
consisting of samples of animal tissue, environmental substances, and manufacturing 
products and by-products, 

25. The method of claim 24 wherein the animal tissue is obtained from a human. 

26. The method of claim 19 further comprising the step of purifying the target 
nucleic acids identified. 

27. A method for replicating an array of single-stranded probes on a solid support 
comprising the steps of: 

a) synthesizing an array of nucleic acids each comprising a constant sequence 
of length C at a 3'-terminus and a random sequence of length R at a 5'- 
terminus; 

b) fixing the array to a first soUd support; 

c) synthesizing a set of nucleic acids each comprising a sequence 
complimentary to the constant sequence; 

d) hybridizing the nucleic acids of the set with the array; 



wo 94/11530 



PCr/US93/106l6 



-78- 



e) enzymaticaUy extending the nucleic acids of the set using the random 
sequences of the array as templates; 

f) denaturing the set of extended nucleic acids; and 

g) fixing the denatured nucleic acids of the set to a second soUd suppon to 
create the replicated array of single-stranded probes. 

28. The method of claim 27 wherein the nucleic adds of the set are conjugated with 
biotin and the second solid support comprises streptavidin. 

29. The method of daim 27 wherein the nudeic adds of the array are between about 
15-30 nudeotides in length and the nudeic adds of the set are between about 10-25 
nucleotides in length. 

30. The method of daim 27 wherein C is between about 7-20 nudeotides and R is 
between about 3-5 nucleotides. 

31. The method of daim 27 wherein the soUd support is selected from the group 
consisting of plastics, ceramics, metals, resins, gels, membranes and chips. 

32. The method of daim 27 wherein the nucleic acids of the set are enzymaticaUy 
extended with a DNA polymerase and one or more deoxynudeotide triphosphates. 

33. The method of daim 27 wherein denaturing is performed with heat, alkali, 
organic solvents, binding proteins, enzymes, salts or combinations thereof. 

34. A replicated array of single-stranded probes made by the method of claim 27. 

35. The method of claim 27 further comprising the step of hybridizing the replicated 
array with a second set of nudeic adds complimentary to the constant sequence of the 
replicated array to create a double-stranded replicated array. 

36. A replicated array of double-stranded probes made by the method of daim 35. 

37. A method for creating a probe comprising the steps of: 

a) synthesizing a plurality of first nucleic adds and a plurality of second 
nucleic adds comprising a random terminal sequence and a sequence 
comphmentary to a sequence of the first nucleic adds; ; 

b) hybridizing the first nucleic adds with the second to form panial duplexes; ■ ] 

c) hybridizing a target nucleic add to the panial duplexes; . j 
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d) ligating the hybridized target to the first nucleic acid of the panial 
duplexes; 

e) isolating the second nucleic add from the ligated duplexes; and 

f) synthesizing a plurality of third nucleic acids each complimentary to the 
constant sequence of the second nucleic acid and hybridizing the third 
nucleic acids with the isolated second nucleic acids to create a probe. 

38. The method of claim 37 wherein the first nucleic acids are each between about 
15-25 nucleotides in length and the second nucleic acids are each between about 20-30 
nucleotides in length. 

39. The method of claim 37 wherein the target is hybridized to the panial duplexes 
under a single set of hybridization conditions. 

40. The method of claim 39 wherein the hybridization conditions comprise a 
temperature of between about 22-37^0C, a salt concentration of between about 0.05-0.2 
M, and a time period of between about 1-14 hours. 

41. The method of claim 37 wherein a double-stranded portion of the partial duplex 
contains an enzyme recognition site. 

42. A probe created by the method of claim 37. 

43. The probe of claim 42 which is fixed to a solid support and the solid support is 
selected from the group consisting of plastics, ceramics, metals, resins, gels, membranes 
and chips. 

44. A diagnostic aid for the detection of a target nucleic acid in a biological sample 
comprising the probe of claim 42, a solid support on which the probe is fixed, a 
detectable label, and the biological sample. 

45. A method for creating a probe comprising the steps of: 

a) synthesizing a plurality of first nucleic acids and a plurality of second 
nucleic acids each comprising a random terminal sequence and a 
sequence complimentary to the sequence of the first nucleic acids; 
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b) hybridizing the first nucleic acids with the second nucleic acids to form 
panial duplexes having a double-stranded ponion and a single-stranded 
portion with the random sequence within the single-stranded ponion; 
hybridizing a target nucleic add to the panial duplexes; 
Hgating the hybridized target to the first nucleic add of the panial duplex- 
hybridizing the ligated target with a set of oligonudeotides comprising 
random sequences; 

ligating the hybridized oligonucleotide to the second nucleic add; 
isolating the oligonucleotide ligated second nudeic add; and 
synthesizing another plurality of first nucleic adds and hybridizing the first 
nudeic adds with the isolated second nudeic add to create the probe 
The method of claim 45 wherein the first nudeic adds are each between about 
15-25 nudeotides in length, the second nucleic adds are each between about 20- 
30 nucleotides in length, and the oligonudeotides are each between about 4-20 
nucleotides in length. 

The method of claim 45 wherein the target is hybridized to the panial duplexes 
under a single set of hybridization conditions. 

48. The method of claim 45 wherem the hybridization conditions comprise a 
temperamre of between about 22-37-OQ a salt concentration of between about 0 05-0 2 
M, and a time period of between about 1-14 hours. 

49. The method of claim 45 wherein th» «o^-„i a i 

-mi wnerem the partial duplexes contain an enzyme 

recognition site. . 

50. A nudeic add probe created by the method of daim 45. 

51. The nucleic add probe of claim 50 which is fixed to a soUd suppon seleded from 
the group consisting of plastics, ceramic, metals, resin, gel, membranes and chips. 

52. A diagnostic aid for the detection of a target nucleic add in a biological sample 
comprising the probe of daim 45. a solid suppon on which the probe is fixed, a 
detectable label, and the biological sample. 
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) form 53. A method for creating a probe comprising the steps of: 

■^f^ed ^) synthesizing a plurality of first nucleic acids and a plurality of second 

nucleic acids comprising a random terminal sequence and a sequence 
complimentary to a sequence of the first nucleic acid; 

'"P^^'^' ^) hybridizing the first nucleic acids to the second nucleic acids to form 

partial duplexes having a double-stranded portion and a single-stranded 
portion with the random nucleotide sequence within the single-stranded 

" portion; 

- c) hybridizing a target nucleic acid to the partial duplexes; 

'^^^^ ^) ^gating the hybridized target to the first nucleic acid of the partial duplex; 

^) enzymatically extending the second nucleic add using the target as a 
about template; 

•"^ 2^ 0 isolating the extended second nucleic add; and 

'^"^'^^ g) synthesizing another first nucleic add and hybridizing the first nucleic add 

with the isolated and extended second nudeic add to create a probe, 
flexes 54. The method of claim 53 wherein the first nudeic adds are each between about 

I 15-25 nudeotides in length and the second nudeic adds are each between about 20-30 

nse a nucleotides in length. 

05-0.2 55. The method of daim 53 wherein the target is hybridized to the partial duplexes 

;k under a single set of hybridization conditions. 

|yme 56. The method of daim 55 wherein the hybridization conditions comprise a 

j| temperature of between about 22-37"OC, a salt concentration of between about 0.05-0.2 

M, and a time period of between about 1-14 hours. 
1 fi-ora 57, The mtihod of daim 53 wherein the double-stranded portion contains an enzyme 

ps- recognition site. 

""P'^ 58. The method of daim 53 wherein the target nudeic add is obtained from a 

biological sample selected from the group consisting of samples of animal tissue, 
environmental substances, and manufacturing products and by-products. 
59. A nudeic add probe created by the method of claim 53. 
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60. The nucleic acid probe of claim 59 which is fixed to a solid suppon and the solid 
support is selected from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and chips. 

61. A diagnostic aid for the detection of a target nucleic acid in a biological sample 
comprising the nucleic acid probe of claim 59, a solid support on which the probe is 
fixed, a detectable label and the biological sample. 

62. An array of 4'^ different nucleic acid probes wherein each probe comprises a 
double-stranded portion of length D, a terminal single-stranded portion of length S, and 
a random nucleotide sequence within the single-stranded portion of length R. 

63. The array of claim 62 wherein D,is between about 3-20 nucleotides and S is 
between about 3-20 nucleotides. 

64. The array of claim 62 which is fixed to a solid support wherein the solid support 
is selected from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and two-dimensional and three-dimensional matrices. 
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